Development of Rubric of Higher Order Thinking Skills Assessment on Mathematics Learning

ABSTRACT


A. INTRODUCTION
Assessment of higher-order thinking skills is still challenging for mathematics teachers in Indonesia. A study found that 79% of elementary teachers have some obstacles in designing and implementing HOTS-based evaluation (Rapih & Sutaryadi, 2018). This finding is supported by research that found teachers have difficulties conducting HOTS-oriented lesson plans and assessment formats (Gradini, 2021;Jelatu et al., 2019;Retnawati et al., 2017;Sujadi et al., 2020). Thus, it affects students' ability to solve HOTS problems (Gradini et al., 2018;Ichsan et al., 2019;Rahmawatiningrum et al., 2019;Sa'Dijah et al., 2020;Santoso et al., 2021). The HOTS content in textbooks and assessment tools is essential and significantly affects students' achievement (Pratama & Retnawati, 2018). A study showed that the test constructed by most teachers does not measure the top-three level of Bloom Taxonomy, which is the higher-order thinking skill level (Abosalem, 2016). Therefore, Malik found that teachers need a ready-to-use assessment instrument at the HOTS level (Malik et al., 2015).
This study relied on HOTS defined by (Brookhart, 2010) that used three terms in defining HOTS, namely; (1) HOTS is a transfer process, (2) HOTS is critical thinking, and (3) HOTS is problem-solving. Two of the most critical educational objectives are to promote retention and transfer (which, when it happens, indicates significant learning). Students must remember what they have learned, whereas transfer necessitates remembering and making sense of and being able to apply what they have learned (Anderson et al., 2001). In other words, HOTS as a transfer process in learning is emerging meaningful learning, namely the ability of students to apply what students have learned into new situations with or without direction. As critical thinking, Brookhart retrieves the idea from (Norris & Ennis, 1989) that asserted critical thinking as a reasonable and reflective process. This theory is supported by a study that proposed critical thinking is essential in mathematics problem-solving skills (Peter, 2012).
Problem-solving is an activity that can help students hone and develop their Higher Order Thinking Skills (HOTS) in mathematics (Abdullah et al., 2015). Meanwhile, in developing HOTS as problem-solving, Brookhart refers to (Brookhart &Nitko, 2011) and(Bransford et al., 2005). HOTS as a problem-solving is a process to make students able to solve real problems in real life, which are generally unique so that the completion procedures are also unique and not routine. Consequently, to assess HOTS in mathematics learning, the teacher must examine the transfer process, critical thinking, and problem-solving skills.
Numerous studies have been conducted on the link between assessment and higher-order thinking skills. Those studies showed that increased student accomplishment was linked to the use of tasks and exams involving intellectual work and critical thinking. For example, Pogrow developed a program that deploys HOTS for educationally disadvantaged kids, students with learning difficulties. The program focuses on four different types of thinking abilities: (1) metacognition, or the ability to think about thinking; (2) inferences; (3) transfer or generalizing ideas across contexts; and (4) information synthesizing (Pogrow, 2005). Another study found that metacognitive training and instruction, both domain-general and domain-specific characteristics, have improved children's performance in various fields (Zohar & Barzilai, 2015). Like Pogrows' program, The Mathematics Learning Discourse (MLD) project reported fostering higher-order thinking and academic language in urban mathematics classrooms (Staples & Truxaw, 2010). Assessing HOTS increases students' thinking skills, achievement, and motivation (Brookhart, 2010). Meanwhile, Widana et al. (2018) found that HOTS assessment effectively increases students' critical thinking in mathematics (Widana et al., 2018). According to Ercikan and Seixas, developing assessments that provide meaningful information are essential (Ercikan & Seixas, 2020). Higgins, Hall, Baumfield, and Moseley conducted a meta-analysis of studies on student cognitive, success, and attitude interventions of thinking skills.
They found that there is a strong effect of the implementation of higher-order thinking skills as an approach on (1) verbal and non-verbal reasoning; (2)reading, mathematics, and science tests; (3) students' attitude and motivation (Higgins et al., 2005). Thus, student improvement in thinking, content area achievement, and motivation can all be aided by thinking-skills interventions. To hone students' capability to analyze, evaluate, and create, teachers have to choose an appropriate learning model, develop good material, and use an appropriate assessment (Rosidin et al., 2019). The programs reported to assess students' higher order thinking skills using a series of tests and rubrics.
Generally, HOTS assessment measures the metacognitive dimension, not just the factual, conceptual, or procedural dimensions. The metacognitive dimension describes the ability of students to connect several different concepts, interpret, problem-solve, deploy problemsolving strategies, find new methods, reasoning, and make the right decisions. So that, in constructing an assessment instrument on HOTS, the teacher should consider the following characteristics: (1) transfer of one concept to another; (2) process and apply information; (3) looking for links from different kinds of information; (4) use information to solve problems; and (5) critically examine ideas and information. In constructing the assessment of HOTS, Brookhart suggested following these principles, namely; (1) using introductory material that is novel and allows students to gather information, and (2) managing the cognitive complexity and difficulty separately to overcome the misconception on level of difficulty and level of thinking.
The rubric has been believed as an assessment tool among teachers to examine students' higher-order thinking skills. Some studies found that rubric is effective in assessing students' higher-order thinking skills, e.g., Marzano Rubric (Marzano & Kendall, 2007), Simulation Thinking Rubric (Doolen, 2015), and Assessment Evaluation Rubric (Tractenberg, 2020). However, the study of the effectiveness of rubric in measuring students higher-order thinking comprehensively in mathematics learning is limited, although there are problem-solving rubrics (Blyman et al., 2020;Di Leo et al., 2019;Gallagher et al., 2000), critical thinking rubric (Saxton et al., 2012), and performance assessment rubric (Borko et al., 1997;Lane et al., 1994) used. Therefore, to measure students' HOTs in mathematics learning, the teacher needs to develop an appropriate rubric.
This study aims to develop a Higher-Order Thinking Skills (HOTs) assessment rubric on mathematics learning. Particularly, this study focused on the rubric to examine students' higher-order thinking skills with the criteria for the developed rubrics are valid, reliable, and practical. The HOTS assessment rubric is essential to develop since teachers have a high demand (Surya et al., 2020). The teacher also tends to evaluate students' understanding of Bloom's taxonomy's three-bottom level (Abosalem, 2016) due to their lack of knowledge and access to an instrument of assessment (Ahmad et al., 2018). It is essential to develop a rubric that teachers can use to measure/examine students' Higher-Order Thinking Skills (HOTS) in the mathematics classroom. This study contributes to filling the gap of effective HOTs assessment rubric in mathematics learning.

B. METHODS
This study is the developmental research. The Plomp's developmental method, the Generic model for educational design, was deployed to develop the HOTS assessment rubric. The method consisted of 4 phases namely, (1) preliminary investigation, (2) design, (3) realization, and (4) implementation and evaluation (Nieveen & Folmer, 2013;Tjeerd Plomp, 2000). The preliminary investigation was conducted in five activities; front-end analysis, student condition analysis, material analysis, task analysis, and specification of learning objectives. The critical element in this phase is defining the problem. In the design phase, the blueprint of the rubric is designed by generating all the parts of the solution, comparing and evaluating the various alternatives then producing the best design choice of the rubrics. The rubric was designed and developed to measure students' higher-order thinking skills in mathematics learning. In the realization phase, the rubric was constructed using the HOTs aspect defined by Brookhart as follows: (1) the top-three level of Bloom's Taxonomy (analyze, evaluate, and create); (2) logical reasoning; (3) problem-solving; (Brookhart, 2010). The rubric also adapted a rubric on problem-solving (Kennedy High School, 2006). In the implementation and evaluation phase, the HOTS assessment rubric's quality was examined using Akker's product/prototype quality criteria: (1) content validity, (2) reliability, and (3) practicality/usability (Nieveen & Folmer, 2013;Van De Akker et al., 2006).
The rubric was validated by two validators that are experts on HOTS in mathematics. If the validity coefficient is high (R > 75%), it can be stated that the HOTS rubric is valid. If this is not the case, it is necessary to make revisions based on suggestions from the validators or by reviewing aspects that have less value. Then it is re-validated and then re-analyzed until it meets the criteria. The content validity was measured using the interrater agreement of experts, as follows (Gregory, 2011): Thatcher states that reliability is the extent to which an experiment, test, or many measurement procedures produce the same results on repeated trials (Thatcher, 2010). Reliability measured by using Kuder-Richardson Formula 21(Brown, 2014), as follow: Where K is test items, p is the proportion of correct responses to each test item, and q is the proportion of incorrect responses to each test item. The practicality/usability of the rubrics was examined by 15 mathematics teachers from different Junior High schools. The practicality questionnaire consisted of 5 Likert-scales then analyzed using practicality product criteria. Where ̅ is mean, is standard deviation, and x is empirical score.

C. RESULT AND DISCUSSION
Blooms' Taxonomy is a commonly used taxonomy in Indonesia since its usefulness in categorizing the learning objectives and assessment into the level of cognitive. The first aspect of the rubric is assessing students' ability to analyze, evaluate, and create. Student ability in the analysis is examined from their abilities to divide or structure information into smaller parts to identify patterns or relationships, recognize and distinguish the causes and effects of a complex scenario, and identify/formulate questions. Meanwhile, the evaluation is examined from student ability to assess solutions, ideas, and methodologies using suitable criteria or existing standards to ensure their effectiveness or benefits, make hypotheses, criticize and test, and accept or reject a statement based on predetermined criteria. The creation level is measured from the student's ability to generalize an idea or perspective on something, design a way to solve the problem and organize elements or parts into a new structure that has never existed before. The second aspect of the HOTS rubric is logic and reasoning. These aspects are measured by students understanding of generating mathematical models, the quality of the mathematical model created, solution construction, the conclusion drawn, and justification (judgment). The third aspect is problem-solving. The aspect is measured by (1) students' understanding of the topics, (2) effective and appropriate problem-solving strategies and producing correct answers, and (3) written mathematical communication. The student's higher-order thinking was measured by using a four-scale rating; exemplary (4), proficient (3), develop (2), and emerging (1). The critical thing to note is that teachers need not use all the aspects of rubrics, but it depends on the level of complexity of the HOTS problem. In assessing the top-three level of Blooms, the rubrics used to depend on the problems assigned to the cognitive level. The HOTS rubric that has been constructed is described as follow:

Expert Judge #2
Weak Relevance (item rated 1 or 2) 2 1 Strong Relevance (item rated 3 or 4) 1 12 The first validation shows that the product has a validity coefficient of 0.706 with a reliability of 0.809. If the coefficient of validity is high (RVI > 75%), then it can be stated that the rubric is valid. Despite the reliability being very reliable, the prototype of the HOTS rubric in mathematics is not valid, yet it needs revision and validation. After doing some revision, two judges were asked to validate the rubric. The result is shown in Table 6.

Expert Judge #2
Weak Relevance (item rated 1 or 2) 0 3 Strong Relevance (item rated 3 or 4) 0 13 The second validation shows that the product has a validity coefficient of 0.81 with a reliability of 0.89. Since the coefficient of validity is high (RVI > 75%) and the reliability is also high, then it can be stated that the HOTS rubric in mathematics is valid and reliable. The practicality measurement is described based on the product practicality classification formula, where the assessment for the practicality of product development consists of 18 question items with a rating scale consisting of 5 categories, namely Very Good (5), Good (4), Fair (3), Poor (2), and Bad (1). By applying the practicality criteria of the product in Table 3, the practicality criteria of the rubric are shown in Table 7. The practicality of the rubric score is 75.08. The HOTS rubric in mathematics is practical. Furthermore, Higher-Order Thinking Skills (HOTS) rubric is valid, reliable, and practical to deploy in mathematics learning. During the trial, the respondent's positive and negative comments were used as a revision. They made positive comments on overall "userfriendliness." They made negative comments in the construction of the rubric, such as it needs to reduce some wordiness and bolding the word that shows the score. However, the respondents were still confused about the scientific terms regarding mathematical reasoning and representation. Mostly, they did not understand the "justification," "representation," and "mathematical model" terms. This problem is solved by giving defining each mathematical term below the rubrics. The respondents and validators also suggest that the rubric is the one-page format, including the column for scoring and student higher-order thinking skills conversion score. Therefore, the revision was made to make the rubric easier to use. The final version of the HOTS rubric in mathematics is shown as follows. Recall that the rubric is used for each student, giving the higher-order thinking skill score. The teacher has to map the level of each student by following the HOTS level category. The HOTs level category, as shown in Table 8 used to map the level of students' higherorder thinking skills that were measured by the rubrics. The score of each aspect is measured by rubric then sum-up to gain the total student score. The average score is then calculated and converted to measure student higher-order thinking skills, as shown in Table 9. Further, the students' higher-order thinking skills are divided into exemplary, proficient, developed, and emerging levels. This level is not in hierarchy, e.g., students need not be on the 'emerging' level first, then on the 'develop' level.
Students on the 'emerging' level are described as students who cannot solve the problems by deploying the concept and formula. Students provide incorrect or incomplete mathematical representations, explanations, and solutions at this level. They also show a lack of understanding, and the answer is unclear, not accurate, and irrelevant. Students on the 'develop' level are described as students who understand the material, although the answers are partially clear and accurate. They had some difficulties making mathematical models, delivering the evidence to support their premise, and somehow creating limited models to simplify complex situations. Students on this level cannot provide correct solutions with logical steps. They also analyze a small amount of information given to mathematical problems that affect misleading and confusing explanations in communicating their answers.
Students on the 'proficient' level have a functional understanding of the material. Therefore, although not well explained, their answers are clear and complete and do not accurately reflect knowledge. Students have an adequate understanding of producing mathematical models and delivering the evidence to support their answer on this level. They can also create models to simplify complex situations and construct logical solutions. Students can analyze most of the information given and deploy appropriate problem-solving strategies. However, some strategies are not needed or are not necessary, even though they produce the right answer. In communicating their answers, they use appropriate mathematical terms and representations.
Students have a clear and accurate understanding and produce accurate mathematical models on this highest level of HOTs, the 'exemplary' level. The evidence to support the answer is clear, logical, and well explained. Students can create models to simplify complex situations and identify the limitations of models. Moreover, they can construct logical, correct, complete solutions with justification and identify the source of the error. They can also analyze all the mathematical problems, deploy effective and appropriate problem-solving strategies, and produce correct answers. Students use appropriate mathematical terms and representations to explain their answers in communicating their answers.

D. CONCLUSION AND SUGGESTIONS
The higher-Order Thinking Skills (HOTS) rubric is valid, reliable, and practical to assess students' HOTS in mathematics learning. The validity coefficient is 0.81 (high validity), the reliability coefficient is 0.89 (very reliable), and the practicality score is 75.08 (practical). The final version of the rubrics is ready to be presented to a bigger group of mathematics teachers to gain some feedback. The teacher feedback is valuable to refine the rubrics. However, the rubric needs to be trialed to measure its effectiveness in assessing the students' higher-order thinking skills. Future research on the HOTS rubric is also needed to see the mathematics teachers' acceptance and usefulness.