Development of Higher Order Thinking Skills Test based on Revised Bloom Taxonomy

ABSTRACT

The unequal amount of questions that test students' lower and higher-order thinking skills is a concern in mathematics learning evaluation. The unequal amount of questions resulted in poor cognitive levels among students. The purpose of this study is to develop questions that may be used to assess the higher-order thinking skills of senior high school pupils and fulfill the requirements of being valid, practical, and effective. This study is a developmental research that employs the formative evaluation model. Validation sheets, student evaluation sheets, and examinations that measure students' higher-order thinking skills are among the data gathering procedures employed. According to the researchers' findings, the questions generated matched the valid, practical, and effective requirements. The valid criteria are based on the validators' assessments, and both validators agree that the questions created are good and possible to utilize with a few adjustments. Student answer papers to the established questions serve as the basis for practical criteria. According to the response form, 85 percent of pupils answered positively to the questions. Effective criteria are based on pupils' abilities to answer the questions that have been established. According to the test findings, 85 percent of students satisfied the minimal completeness requirements for both questions testing higher-order thinking skills, indicating that the questions were effective. The questions that have been developed by researchers can be used by teachers to evaluate students' higher-order thinking skills so that teachers do not only provide questions that measure students' lower-order thinking skills. In addition, the questions that have been developed can be used as a reference for developing questions that can measure students' higher-order thinking skills.
pupils can only answer math problems at a cognitive level of remembering. Students can only recall knowledge stored in long-term memory, which is then utilized to solve issues. Therefore they rely solely on memory.
Previous research also revealed that students' cognitive levels were still dominated by low cognitive levels, especially at the level of remembering and understanding (Barut & Wijaya, 2021) while the cognitive level of analyzing, evaluating, and creating is still deficient (Ayu Rahayu, 2018). The same thing was conveyed by previous research which stated that the high cognitive level of students was still a concern. Research conducted by Megawati et al. (2019) shows that students' overall high cognitive level is very concerning where the analysis indicator is in the low category while the indicator evaluates and creates in the deficient category. These results are the same as the research results obtained by Ichsan et al (2019). Furthermore, research conducted by Mandini & Hartono (2018) shows that the high cognitive level of students is included in the moderate criteria where students still have difficulty making generalizations. Situmorang et al (2020) also states that students have difficulty when faced with questions at the level of evaluating and creating, so the cognitive level of evaluating and creating students is still lacking. Megawati et al (2019) and Rahayu et al (2021) revealed that students' high cognitive level was still low, especially for evaluation and creation indicators.
The low cognitive level of students, especially for higher-order thinking skills, is caused by some factors so students find it challenging to solve problems with high cognitive levels. Questions that measure high-order thinking skills are contextual questions based on everyday life (narrative questions) but this becomes an obstacle for students who do not like narrative questions (Alhassora et al., 2017;Hadi et al., 2018;Retnawati et al., 2017). In addition, the obstacles faced by students are difficulties in planning to solve problems, difficulties in developing mathematical models using contextual problems, difficulties in determining which formulas are suitable for solving problems, difficulties in applying formulas, difficulties in connecting information and applying strategies to solve problems, difficulties in manipulating, and limitation in information literacy Hadi et al., 2018;Tanudjaya & Doorman, 2020).
The low cognitive level of students is also caused by the teacher rarely giving questions that measure higher-order thinking skills. The practice questions given by the teacher are the questions in the mathematics textbook. Previous research stated that students' mathematics textbooks were dominated by questions that measured lower-order thinking skills, namely applying (Klorina et al., 2021) so that the distribution of questions based on a cognitive level is not proportional. Furthermore, Cahyono & Adilah (2016) discovered that 16.98 percent of the questions in students' mathematics textbooks were questions with a cognitive level of remembering, 53.77 percent were questions with an applied cognitive level that were part of lower-order thinking skills, and 29.25 percent were questions with a cognitive level of reasoning that were part of higher-order thinking skills.
The disproportionate number of questions that can evaluate lower and higher-order thinking skills is not complemented by instructors' capacity to construct examinations measuring higher-order thinking skills. The instructor understands how vital it is to train students on higher order thinking skills questions, but there are still misunderstandings about creating higher-order thinking skills questions . Based on the teacher's perspective, questions that measure high cognitive levels must be difficult. However, this misunderstanding is not accompanied by the teacher's concern for studying and designing higher-order thinking skills questions (Ramdiah et al., 2019). In addition, teachers also have difficulty in designing questions of high cognitive level. This difficulty is shown by the daily test questions made by the teacher that only meet the cognitive level of remembering, understanding, and applying (Amelia et al., 2016;Himmah et al., 2019). In addition, research conducted by Meldawati et al (2020) and Febrilia (2019) show that the questions made by the teacher are dominated by questions that measure low cognitive levels. However, in-service teachers and pre-service teachers experience problems in designing questions that can measure high cognitive levels (Listiani & Sulistyorini, 2020;Samo, 2017;Tanujaya & Mumu, 2020). Therefore, conducting assessments that measure low cognitive levels is still an obstacle for teachers (Afifah & Retnawati, 2019;Retnawati et al., 2016).
Some researchers have developed several questions that measure students' cognitive levels especially higher-order thinking skills, but the questions developed are only for junior high school students (Fadlila & Sagala, 2021;Husna et al., 2018;Kusaeri et al., 2018;Muklis et al., 2018;Sagala & Andriani, 2019;Yunita et al., 2018). However, the development of questions that measure the higher-order thinking skills of senior high school students has never been done before, especially on function composition material. The difference between previous research and the current research is that this research develops questions that can measure higher-order thinking skills of senior high school students, especially in the composition function material.
Due to the imbalance in the number of questions that can measure lower and higherorder thinking skills in students' mathematics textbook and the absence of developing questions that measure students' higher-order thinking skills in the composition function material, the researcher wants to develop mathematics problems for senior high school students in the composition function material that can measure higher-order thinking skills. The purpose of this study is to produce questions that can measure the higher-order thinking skills of senior high school students in the composition function material that meets the criteria of being valid, practical, and effective. The questions that have been developed by researchers can be used by teachers to measure students' higher-order thinking skills and can be a reference for teachers who find it difficult to develop questions that can measure higherorder thinking skills.

B. METHODS
Research and development with a formative evaluation model were deployed as the research approach. Questions are developed in two stages: the preparatory stage and the formative evaluation stage. This study employed the formative assessment step provided by (Tessmer, 1998), which includes (1) self-evaluation, (2) prototyping (expert review, one-toone, and small group), and (3) field testing. The following is a chart of the formative evaluation model, as shown in Figure 1. The research begins with the preliminary stage, where the researcher determines the location and also the research subject. Next, the researcher enters the self-evaluation stage where the researcher develops questions to be tested. The development of questions is adjusted to the basic competencies that have been prepared by the Ministry of Education and Culture. If the developed questions have been designed, the researcher asks for advice from two experts to provide an assessment of whether the developed questions are valid or not. In addition, the researchers also asked for advice from experts so that the questions developed were even better. Along with expert reviews, the researcher conducted an instrument test on three students who were not research subjects. This stage is referred to as one-to-one. In addition to solving the questions, the researcher also asked students to provide comments on the questions. After getting comments from experts and students, the researcher revised the questions and then tested them at the small group stage. In the small group stage, 6 students were asked to complete the questions and also comment on the questions. If there are student comments about the improvement of the questions, the researcher revises the questions, but if there are none, the researchers immediately test the product to the field test stage. At the field test stage, students are asked to solve questions and provide responses to questions that have been developed.
There are 25 Public Senior High School students in Tebing Tinggi City who are the subjects of this study. The students involved were aged 16-17 with varying mathematical ability. Data was collected using a higher-order thinking skills test and validation sheets by experts. The questions developed were 9 questions consisting of 6 multiple choice questions and 3 essay questions.

C. RESULT AND DISCUSSION 1. Preliminary
The first stage that the researcher went through was preliminary. The researcher decides the location and study subject by contacting the mathematics instructor to tailor the research timetable to the mathematics learning schedule. At this stage, the researchers asked a mathematics teacher from one of Tebing Tinggi's senior high schools whether he would be prepared to study on the class that would be utilized. The preliminary study took place in the even semester of the 2020/2021 academic year, with students from class X MIPA 1 as research participants.

Self Evaluation
Self evaluation is the first stage in the formative evaluation process. At this stage, the researchers examined the characteristics of the students, the school's curriculum, the books utilized, and the design of the questions. The students in this study did the composition function learning. The school follows the 2013 curriculum, and the textbook is a mathematics textbook for class X SMA/MA produced by the Ministry of Education and Culture. Following the analysis, the researchers created the exam questions. Before being validated by experts, the researchers' design is the initial product (prototype 1), comprises of 9 questions (6 multiple choice questions and 3 essay questions) so that each indicator of higher-order thinking skills is made up of 2 multiple choice questions and 1 essay question. The following is an example of a question created as prototype 1. The question is included in the level of creation (C6), as shown in Figure 2.

Expert Reviews
After making the questions to be tested, the researcher carried out the next stage, which included expert review. The developed questions are said to be valid if there is a strong theoretical rationale and there is internal consistency. The data collection instrument used to determine the validity of the questions developed was a validation sheet by experts. Expert reviews were conducted to determine the level of validity of the questions. Experts evaluate the developed questions in terms of substance, structure, and language. Furthermore, the experts make ideas to improve the problems that have been generated. The expert reviews conducted out over Whatsaap. Validators, especially two academics from Yogyakarta State University's Department of Mathematics Education, are involved in this step. The following are the outcomes of expert validation, as shown in Table 1. The validator not only evaluates the produced questions but also gives ideas to researchers about the questions. The following are validator recommendations for researchers, as shown in Table 2. Adding an illustration to the problem according to the story that was made Make different distinctions between fried chicken, french fries, and soft drink.
Subtitute fried chicken to x, French fries to y, and soft drink to z. The function that states the price of fried chicken, french fries, and soft drinks should be made in the form of a table Change the question from story form to tabular form.
Based on Table 1, the instrument that has been developed to measure students' higherorder thinking skills is declared valid based on the validator's assessment. Validator 1 gives an average score of 4.7 so it can be said that the instrument is valid while validator 2 gives an average score of 4.5 so it can be said that the instrument is valid. Based on this score, the average validator overall is 4.6 so that the instrument is declared valid. Therefore, the process of developing questions can be continued to the next stage, but there are some suggestions given by the validator so that the instruments that have been developed are even better. The suggestions given by the validators are listed in Table 2, namely the use of terms must be the same in the questions, recheck the calculations in the completion procedure, use illustrations for contextual questions, use different symbols for different types of fast food, and change the questions in the form of graphic info like a table or a chart. After getting suggestions from the validators, the researcher revised the instrument to be tested at the one-to-one stage.

One-to-One
While verifying with experts, the researchers tested prototype 1 on three students who were not study subjects: one student with high mathematical ability, one student with moderate mathematical ability, and one student with low mathematical ability. The three pupils are from one of Tebing Tinggi's public senior high schools. In addition to solving these questions, students also provide comments or suggestions on the tested questions. The following are the results of the instrument test on three students who were not research subjects, as shown in Table 3. Students are not only asked to solve problems that have been developed but are also asked to provide comments on the questions. The following are comments given by students, as shpwn in Table 4. There are typing errors in some questions.
Fixed typing errors in some questions.

S3
Some questions are difficult to understand.
Sentences are arranged in a simpler way so that students can understand easily.
Based on Table 3, it was found that student with high mathematical ability got a score of 88 so he was categorized as good. Student with moderate mathematical ability got a score of 80 so that she is included in the enough category, while student with low mathematical ability got a score of 76 so she is included in the enough category. In addition, answering the questions developed, students also commented according to Table 4. Student 1 did not comment on the questions, while student 2 commented that there were typos in some questions and student 3 commented that some questions were difficult to understand. After getting comments from validators and students, the researcher revised prototype, (1) The revised prototype 1 was called prototype, (2) The results of prototype 2 were as follows as shown in Figure 3.

Small Group
Following the creation of prototype 2, the researchers tested it on six pupils. This stage is known as a small group stage. At this stage, the students involved were students from one of public senior high schools who were not the subject of the study. They included two students with high mathematical ability, two students with moderate mathematical ability, and two students with low mathematical ability. The researchers asked the six students to solve prototype 2 questions and also provide comments or suggestions on the questions given. The following are the results of testing prototype 2 in a small group, as shown in Table 5. Based on Table 5, it is found that students with high mathematical ability have a higherorder thinking skills score of 92 in the very good category and 86.5 in the good category. Students with moderate mathematical ability have a score of 86 higher-order thinking skills in the good category and 84 in the good category while students with low mathematical ability who have a higher-order thinking skills score are 79.5 in the enough category and 78 in the enough category. At this stage students do not provide comments to correct the questions, but students give opinions that the questions developed are very interesting and challenging although some students say that some questions are difficult to solve.

Field Test
The field test was the final stage that the researcher went through. The results of the small group stage, there were no students who gave comments to improve the questions. Therefore, to go to the field test stage the researchers did not make revisions or in other words the questions tested at the small group stage and the field test stage were the same. The purpose of the field test is to examine the effective and practical criteria . Effectiveness is the potential effect that students get after working on the developed questions. The data collection instrument used to determine the effectiveness of the questions was a test. The effectiveness criteria used are that 85% of students meet the criteria for mastery learning for low cognitive levels and high cognitive levels. Practicality relates to the positive response given by students regarding the questions being tested. The data collection instrument used to determine the practicality is the student response sheet. The practicality criteria used are that 85% of students have a positive attitude towards the questions. At the field test stage, students completed as many as 9 questions, which included 6 multiple choice questions and 3 essay questions. Student scores are used as data to determine whether the questions developed are effective or not. The effective criteria in this research are that at least 85% of students have a score of more than 75. The results of the field test are as shown in Table 6. Based on Table 5, it can be seen that there are 4 students (16%) in the very good category. There were 8 students (32%). There are 10 students (40%) in the enough category, while 3 students (12%) in the bad category. Based on these results, it was found that 22 students (88%) completed working on the composition function questions to measure higher-order thinking skills. Therefore, the instruments that have been developed meet the criteria for effectiveness. In addition to solving the problems that have been developed, students fill out a student assessment sheet questionnaire on the developed questions. Student response data is used to assess whether the questions developed are practical or not. The questions developed are said to be practical if students who have a positive response are at least 85%. The students' responses to the questions given are as shown in Table 7. Based on Table 7, it was found that as many as 22 students (88%) gave a positive response to the questions that had been developed. A total of 3 students (12%) gave a positive enough response to the questions given. Meanwhile, there were no students who gave a negative or negative enough response to the questions that had been developed. Therefore, the questions that have been developed meet the criteria of practicality. Previous research has also demonstrated that the questions devised to assess students' cognitive level fit the requirements of being valid, practical, and effective (Budiman & Jailani, 2014;Husna et al., 2018;Muklis et al., 2018). In addition, research that develops questions to measure students' higher order thinking skills also meets valid, practical, and effective criteria (Arifin & Retnawati, 2017;Fadlila & Sagala, 2021;Gusdinata & Somakim, 2020;Kusaeri et al., 2018;Nursalam et al., 2018;Oktaviana & Susiaty, 2020;Prabowo et al., 2021;Sagala & Andriani, 2019;Wulandari et al., 2020;Yunita et al., 2018;Zaki et al., 2020). This shows that quite a lot of questions have been developed, but not all mathematical material has been developed that measures students' cognitive level.
After carrying out the procedure for developing questions using the formative evaluation model, it was found that the questions developed met the criteria of being valid, effective, and practical. To produce this product, one revision was made by referring to comments from validators (expert reviews) and students (one-to-one). The suggestions given are that the use of terms should not be different, be more careful in making completion sheets, use illustrations for contextual problems, use info graphics to make them more interesting, be more thorough in typing errors, and use simpler sentences. These suggestions can be used as a reference for researchers or teachers who want to develop questions.

D. CONCLUSION AND SUGGESTIONS
According to the study's findings, the questions generated matched the requirements of being valid, practical, and effective. The expert's appraisal of the questions that have been established demonstrates validity. The questions generated are divided into three categories: content, construct, and language. It is clear from the student's assessment form on the generated questions. The practical requirements were satisfied by 85 percent of the students who responded positively to the questions. The amount of pupils who fulfill the minimal completion criterion demonstrates effectiveness. The questions were created to meet the effective criteria because 85 percent of the students achieved the minimal completeness criterion. Suggestions can be made to other researchers in order to design questions that can test students' higher-order thinking skills in different high school level materials. Furthermore, teachers should assign mathematical problems to students that include several questions that test lower and higher-order thinking abilities proportionately, allowing students to gain higher order thinking skills. The questions that have been developed by the researcher can be used by the teacher when learning the composition function so that the comparison of lower and higher-order thinking skills questions is the same. The questions developed by the researcher can also be used as a reference for teachers to develop high school math problems on other materials.

ACKNOWLEDGEMENT
The researchers would like to express their gratitude to Public Senior High School 1 Tebing Tinggi for granting permission to conduct the study, as well as the children who volunteered to be research subjects. The researcher also expresses gratitude to the validators who were ready to help improve the questions that were created.