Evaluating the Quality of Mid-Semester Mathematics Summative Assessment in Secondary School: A Psychometric Analysis of Test Items

Yoga Tegar Santosa; Dini Wardani Maulida; Juli Ferdianto; Sri Sutarni; Yulia Maftuhah Hidayati

doi:10.31764/ijeca.v8i3.36276

Evaluating the Quality of Mid-Semester Mathematics Summative Assessment in Secondary School: A Psychometric Analysis of Test Items

Yoga Tegar Santosa, Dini Wardani Maulida, Juli Ferdianto, Sri Sutarni, Yulia Maftuhah Hidayati

Abstract

Mid-semester summative assessments play a crucial role in supporting competency-based learning in the Kurikulum Merdeka. However, existing studies and field practices indicate a persistent gap: teachers rarely conduct systematic psychometric evaluations. Addressing this gap, this study aims to (1) analyze the structure and characteristics of a mid-semester mathematics summative assessment and (2) evaluate the quality of its items based on psychometric criteria within the framework of CTT. Using a mixed-methods sequential exploratory design, data were obtained from two mathematics education experts, two mathematics teachers, and a school principal in an Islamic Integrated Secondary School in Sukoharjo Regency. Data sources included interview transcripts, assessment documents, students’ response sheets, and expert validation forms. Qualitative data were analyzed through data reduction, display, and conclusion drawing, while quantitative data were examined using Aiken’s V and CTT. The findings reveal that the assessment consisted of 40 multiple-choice items and 5 essay questions, covering Number and Algebra elements of Phase D in the Merdeka Curriculum. The items' content validity was moderate, with strengths in language but weaknesses in cognitive level alignment. Empirical results showed some multiple-choice items were invalid, while all essay questions were valid and reliable (r = 0.88). Most items were moderately difficult, with a discrimination index from fair to excellent (0.3 ≤ D ≤ 0.8). However, nearly one-third of distractors in the multiple-choice items did not function well. These results highlight the need for improved item construction and teacher capacity-building to ensure assessments that align with the principles of the Kurikulum Merdeka and support high-quality measurement of student competency.

Keywords

Merdeka Curriculum; Psychometric Analysis; Secondary School; Summative Assessment.

Full Text:

DOWNLOAD [PDF]

References

Ahmad, A., Judijanto, L., Jeranah, Halomoan, J. L. A., & Ichsan, M. (2024). Barriers and Difficulties of Students in the Mathematics Learning Process in Junior High Schools. Journal of Education Research and Evaluation, 8(2), 306–316. https://doi.org/10.23887/jere.v8i2.74056

Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement, 40(4), 955–959. https://doi.org/10.1177/001316448004000419

Aiken, L. R. (1985). Three Coefficiennts for Analyzing the Reliability and Validity of Ratings. Educational and Psychological Measurement, 45, 131–142. https://doi.org/10.1177/0013164485451012

Allen, M. J., & Yen, W. M. (2001). Introduction to Measurement Theory. Waveland Press.

Alonzo, D., Labad, V., Bejano, J., & Guerra, F. (2021). The Policy-driven Dimensions of Teacher Beliefs about Assessment Assessment. Australian Journal of Teacher Education, 46(3), 36–52. https://doi.org/10.14221/ajte.2021v46n3.3

Anderson-Levitt, K. (2025). The deficit model in PISA assessments of competencies: counter-evidence from anthropology. Globalisation, Societies and Education, 23(4), 942–958. https://doi.org/10.1080/14767724.2023.2223141

Andriatna, R., Sujadi, I., Budiyono, Kurniawati, I., Wulandari, A. N., & Puteri, H. A. (2024). Junior high school students’ numeracy in geometry and measurement content: Evidence from the minimum competency assessment result. Proceeding of the 7th National Conference on Mathematics and Mathematics Education (SENATIK). https://doi.org/10.1063/5.0194570

Anyawale, M. A., Chere-Masopha, J., & Morena, M. C. (2022). The Classical Test or Item Response Measurement Theory: The Status of the Framework at the Examination Council of Lesotho. International Journal of Learning, Teaching and Educational Research, 21(8), 384–406. https://doi.org/10.26803/ijlter.21.8.22

Awalurahman, H. W., & Budi, I. (2024). Automatic distractor generation in multiple-choice questions: a systematic literature review. PeerJ Computer Science, 10(2), 1–27. https://doi.org/10.7717/peerj-cs.2441

Bahena, R. D., Kilag, O. K. T., Andrin, G. R., Diano, F. M., & Unabia, R. P. (2024). From Method to Equity : Rethinking Mathematics Assessment Policies in Education. EXCELLENCIA: International Multi-Disciplinary Journal Of Edcation, 2(1), 121–132. https://multijournals.org/index.php/excellencia-imje/article/view/281

Bhat, S. K., & Prasad, K. H. (2021). Item analysis and optimizing multiple-choice questions for a viable question bank in ophthalmology. Indian Journal of Ophthalmology, 69(2), 343–346. https://doi.org/10.4103/ijo.IJO_1610_20

Butakor, P. K. (2022). Using Classical Test and Item Response Theories to Evaluate Psychometric Quality of Teacher-Made Test in Ghana. European Scientific Journal, 18(1), 139–168. https://doi.org/10.19044/esj.2022.v18n1p139

Charles, K. J. (2023). Hyflex Instruction: Using Results from Mid-Semester Evaluations for Improvement. International Journal of Science and Research (IJSR), 12(9), 325–336. https://doi.org/10.21275/SR23901210226

Creswell, J. W., & Clark, V. L. P. (2017). Designing and Conducting Mixed Methods Research (3rd ed.). SAGE Publications.

Dewi, W. O., & Prabowo, A. (2022). Item Analysis of the Mid-Semester Assessment for Grade VIII A Mathematics in the 2018/2019 Academic Year at SMP Negeri 3 Mlati. AdMathEduSt: Jurnal Ilmiah Mahasiswa Pendidikan Matematika, 9(2), 76–83. https://doi.org/10.12928/admathedust.v9i2.25347

Ebel, R. L., & Frisbie, D. A. (1991). Essentials of Educational Measurement (5th ed.). Englewood Cliffs, N.J.

Elgadal, A. H., & Mariod, A. A. (2021). Item Analysis of Multiple-choice Questions (MCQs): Assessment Tool For Quality Assurance Measures. Sudan Journal of Medical Sciences, 16(3), 334–346. https://doi.org/10.18502/sjms.v16i3.9695

Farida, F., & Musyarofah, A. (2021). Validity and Reliability in Item Analysis. Al-Mu’arrib: Jurnal Pendidikan Bahasa Arab, I(1), 34–44. https://doi.org/10.32923/al-muarrib.v1i1.2100

Feldman, L. I. (2025). The Role of Assessment in Improving Education and Promoting Educational Equity. Education Sciences, 15(2), 1–11. https://doi.org/10.3390/educsci15020224

Fitria, N. N., Mufidah, L. L. N., & Setiawati, P. (2024). Summative Assessment of Islamic Education Subject in Merdeka Curriculum. Journal of Educational Research and Practice, 2(3), 328–338. https://doi.org/10.70376/jerp.v2i3.157

Ghimire, L. (2021). Assessment of the policy. In Multilingualism in Education in Nepal (pp. 128–150). Routledge India. https://doi.org/10.4324/9781003159964-7

Ginting, P., Hasnah, Y., Hasibuan, S. H., & Batubara, I. H. (2021). Evaluating Cognitive Level of Final Semester Examination Questions Based on Bloom’s Revised Taxonomy. AL-ISHLAH: Jurnal Pendidikan, 13(1), 186–195. https://doi.org/10.35445/alishlah.v13i1.385

Griffin, P., Care, E., Francis, M., & Scoular, C. (2014). The Role of Assessment in Improving Learning in a Context of High Accountability. In Designing Assessment for Quality Learning (pp. 73–87). Springer. https://doi.org/10.1007/978-94-007-5902-2_5

Hadi, A. F. M. Q. Al, Listari, D. A., Meilawati, A., & Inayati, N. L. (2024). Implementation of Summative Evaluation in Islamic Education Learning at SMPN 1 Surakarta. TSAQOFAH, 4(1), 769–778. https://doi.org/10.58578/tsaqofah.v4i1.2570

Hadzhikoleva, S., Hadzhikolev, E., Gaftandzhieva, S., & Pashev, G. (2025). A conceptual framework for multi-component summative assessment in an e-learning management system. Frontiers in Education, 10(1), 1–12. https://doi.org/10.3389/feduc.2025.1656092

Halimi, K., & Seridi-Bouchelaghem, H. (2021). Students’ competencies discovery and assessment using learning analytics and semantic web. Australasian Journal of Educational Technology, 37(5), 77–97. https://doi.org/10.14742/ajet.7116

Hartati, N., & Yogi, H. P. S. (2019). Item Analysis for a Better Quality Test. English Language in Focus, 2(1), 57–70. https://doi.org/10.24853/elif.2.1.59-70

Heil, J., & Ifenthaler, D. (2023). Online Assessment in Higher Education: A Systematic Review. Online Learning, 27(1), 187–218. https://doi.org/10.24059/olj.v27i1.3398

Ishaq, K., Majid, A., Rana, K., Azan, N., & Zin, M. (2020). Exploring Summative Assessment and Effects : Primary to Higher Education. Bulletin Of Education and Research, 42(3), 23–50. https://eric.ed.gov/?id=EJ1291061

Kemendikbudristek. (2022). Learning and Assessment in Early Childhood, Primary, and Secondary Education (Pembelajaran dan Asesmen Pendidikan Anak Usia Dini, Pendidikan Dasar, dan Menegah).

Kenea, T. G., Mikire, F., & Negawo, Z. (2023). The Psychometric Properties and Performances of Teacher-Made Tests in Measuring Students ’ Academic Performance in Ethiopian Public Universities : Baseline Survey Study. Research Square, 1(4), 1–23. https://doi.org/10.21203/rs.3.rs-3095433/v1

Kissi, P., Baidoo-Anu, D., Anane, E., & Annan-Brew, R. K. (2023). Teachers’ test construction competencies in examination-oriented educational system: Exploring teachers’ multiple-choice test construction competence. Frontiers in Education, 8(1), 1–14. https://doi.org/10.3389/feduc.2023.1154592

Klee, H. L., & Miller, A. D. (2019). Moving Up! Or Down? Mathematics Anxiety in the Transition From Elementary School to Junior High. The Journal of Early Adolescence, 39(9), 1311–1336. https://doi.org/10.1177/0272431618825358

Koçdar, S., Karadag, N., & Sahin, M. D. (2016). Analysis of the Difficulty and Discrimination Indices of Multiple-Choice Questions According to Cognitive Levels in an Open and Distance Learning Context. The Turkish Online Journal of Education Technology, 15(4), 16–24. https://eric.ed.gov/?id=EJ1117619

Mahphoth, M. H., Sulaiman, Z., Koe, W., Kamarudin, N. A., Puspo, & Dirgantari, D. (2021). Psychometric Assessment of Young Visitors at the National Museum Of Malaysia. Asian Journal of University Education, 17(2), 1–13. https://doi.org/10.24191/ajue.v17i2.13396

Malapane, T. A., & Ndlovu, N. K. (2024). Assessing the Reliability of Likert Scale Statements in an E-Commerce Quantitative Study: A Cronbach Alpha Analysis Using SPSS Statistics. 2024 Systems and Information Engineering Design Symposium (SIEDS), 90–95. https://doi.org/10.1109/SIEDS61124.2024.10534753

Manfaat, B., Nurazizah, A., & Misri, M. A. (2021). Analysis of mathematics test items quality for high school Analysis of mathematics test items quality for high school. Jurnal Penelitian Dan Evaluasi Pendidikan, 25(1), 108–117. https://doi.org/10.21831/pep.v25i1.39174

Marsevani, M. (2022). Item Analysis Of Multiple-Choice Questions: An Assessment Of Young Learners. English Review: Journal of English Education, 10(2), 401–408. https://doi.org/10.25134/erjee.v10i2.6241

Masyitoh, M., Ahda, Y., Hartanto, I., & Darussyamsu, R. (2020). An Analysis of High Order Thinking Skills Aspects on the Assessment Instruments Environmental Change Topic for the 10th GradeSenior High School Students. Jurnal Atrium Pendidikan Biologi, 5(4), 1–7. https://doi.org/10.24036/apb.v5i4.6945

Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative Data Analysis: A Methods Sourcebook (3rd ed.). SAGE Publications. https://www.ucg.ac.me/skladiste/blog_609332/objava_105202/fajlovi/Creswell.pdf

Mumpuni, K. E., & Ramli, M. (2018). Students’ Understanding and Approvement toward Assessment for Learning. BIOEDUKASI: Jurnal Pendidikan Biologi, 11(1), 55–60. https://jurnal.uns.ac.id/bioedukasi/article/download/19746/pdf

Nitko, A. J., & Brookhart, S. M. (2011). Educational Assessment of Students (6th ed.). Pearson/Allyn & Bacon.

Nitko, A. J., & Brookhart, S. M. (2019). Educational Assessment of Students (8th ed.). Pearson.

Nurjanah, S., Iqbal, M., Zafrullah, Z., Mahmud, M. N., Seran, D. S. F., Suardi, I. K., & Arriza, L. (2024). Psychometric quality of multiple-choice tests under classical test theory (CTT): AnBuso, Iteman, and R. Jurnal Penelitian Dan Evaluasi Pendidikan, 28(2), 161–172. https://doi.org/10.21831/pep.v28i2.71542

Odukoya, J. A., & Omonijo, D. O. (2024). Discriminatory indices of ‘introduction to psychology’ multiple choice examination questions. Edelweiss Applied Science and Technology, 8(6), 8833–8847. https://doi.org/10.55214/25768484.v8i6.3880

Orhani, S. (2024). Preparation of Tests from the Subject of Mathematics According to Bloom ’ s Taxonomy. International Journal of Research Publication and Reviews, 5(2), 2335–2345. https://doi.org/10.55248/gengpi.5.0224.0542

Pokropek, A., Marks, G. N., & Borgonovi, F. (2022). How much do students’ scores in PISA reflect general intelligence and how much do they reflect specific abilities? Journal of Educational Psychology, 114(5), 1121–1135. https://doi.org/10.1037/edu0000687

Popham, W. J. (2017). Classroom Assessment: What Teachers Need to Know. Pearson Education.

Priyatni, E. T., & Martutik. (2020). The Development of a Critical–Creative Reading Assessment Based on Problem Solving. Sage Open, 10(2), 1–9. https://doi.org/10.1177/2158244020923350

Rahmadani, N., & Hidayati, K. (2023). Quality of Mathematics Even Semester Final Assessment Test in Class VIII Using R Program. Jurnal Pendidikan Matematika, 17(3), 397–416. https://doi.org/10.22342/jpm.17.3.20627.397-416

Raykov, T., & Zhang, B. (2025). The One-Parameter Logistic Model Can Be True With Zero Probability for a Unidimensional Measuring Instrument: How One Could Go Wrong Removing Items Not Satisfying the Model. Educational and Psychological Measurement, 85(4). https://doi.org/10.1177/00131644251345120

Regina, A. (2024). Assessment Rubric for Historical Thinking Skills in Accordance with the Kurikulum Merdeka. EDUTEC : Journal of Education And Technology, 7(4). https://doi.org/10.29062/edu.v7i4.784

Retnawati, H. (2022). Estimating Item Parameters and Student Abilities : An IRT 2PL Analysis of Mathematics Examination. Al-Islah: Jurnal Pendidikan, 14(1), 385–398. https://doi.org/10.35445/alishlah.v14i1.926

Retnawati, H., Kartowagiran, B., Arlinwibowo, J., & Sulistyaningsih, E. (2017). Why are the Mathematics National Examination Items Difficult and What Is Teachers’ Strategy to Overcome It? International Journal of Instruction, 10(3), 257–276. https://doi.org/10.12973/iji.2017.10317a

Rezigalla, A. A., Eleragi, A. M. E. S. A., Elhussein, A. B., Alfaifi, J., ALGhamdi, M. A., Al Ameer, A. Y., Yahia, A. I. O., Mohammed, O. A., & Adam, M. I. E. (2024). Item analysis: the impact of distractor efficiency on the difficulty index and discrimination power of multiple-choice items. BMC Medical Education, 24(1), 445–451. https://doi.org/10.1186/s12909-024-05433-y

Roach, V. A. (2025). Validity: Conceptualizations for anatomy and health professions educators. Anatomical Sciences Education, 18(8), 751–756. https://doi.org/10.1002/ase.70016

Rush, B. R., Rankin, D. C., & White, B. J. (2016). The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value. BMC Medical Education, 16(1), 250–259. https://doi.org/10.1186/s12909-016-0773-3

Shakurnia, A., Ghafourian, M., Khodadadi, A., Ghadiri, A., Amari, A., & Shariffat, M. (2022). Evaluating Functional and Non-Functional Distractors and Their Relationship with Difficulty and Discrimination Indices in Four-Option Multiple-Choice Questions. Education in Medicine Journal, 14(4), 55–62. https://doi.org/10.21315/eimj2022.14.4.5

Shankar, D. R., Singh, D. H. P., Dewan, D. S., & Singh, D. R. (2024). An In-Depth Analysis of Multiple-Choice Question Quality In Community Medicine Examinations: Evaluating Implications For Competency-Based Medical Education At Noida International Institute Of Medical Sciences (NIIMS). African Journal of Biomedical Research, 27(45), 13959–13964. https://doi.org/10.53555/AJBR.v27i4S.7072

Sozer, E. M., Zeybekoglu, Z., & Kaya, M. (2019). Using mid-semester course evaluation as a feedback tool for improving learning and teaching in higher education. Assessment & Evaluation in Higher Education, 44(7), 1003–1016. https://doi.org/10.1080/02602938.2018.1564810

Stankous, N. V. (2016). Constructive Response Vs. Multiple-Choice Tests In Math: American Experience And Discussion (Review). European Scientific Journal, 12(10), 1–9. https://doi.org/10.19044/esj.2016.v12n10p%p

Terao, T., & Ishii, H. (2020). A Comparison of Distractor Selection Among Proficiency Levels in Reading Tests: A Focus on Summarization Processes in Japanese EFL Learners. Sage Open, 10(1), 1–14. https://doi.org/10.1177/2158244020902087

Ukobizaba, F., Nizeyimana, G., & Mukuka, A. (2021). Assessment Strategies for Enhancing Students’ Mathematical Problem-solving Skills: A Review of Literature. Eurasia Journal of Mathematics, Science and Technology Education, 17(3), 1–10. https://doi.org/10.29333/ejmste/9728

Vincent, W., & Shanmugam, S. K. S. (2020). The Role of Classical Test Theory to Determine the Quality of Classroom Teaching Test Items. Pedagogia : Jurnal Pendidikan, 9(1), 5–34. https://doi.org/10.21070/pedagogia.v9i1.123

Wahyuni, A., Muhaimin, L. H., Hendriyanto, A., & Tririnika, Y. (2024). Exploring Middle School Students’ Challenges in Mathematical Literacy: A Study on AKM Problem-Solving. AL-ISHLAH: Jurnal Pendidikan, 16(3), 3335–3349. https://doi.org/10.35445/alishlah.v16i3.5729

Wati, D. D. E., Dewi, R. K., & Amri, C. (2023). Analysis of student ability formulating learning objectives in natural science phase D kurikulum merdeka. Jurnal Atrium Pendidikan Biologi, 8(1), 15–21. https://doi.org/10.24036/apb.v8i1.14028

Xiromeriti, M., & Newton, P. M. (2024). Solving Not Answering. Validation of Guidance for Writing Higher-Order Multiple-Choice Questions in Medical Science Education. Medical Science Educator, 34(6), 1469–1477. https://doi.org/10.1007/s40670-024-02140-7

Xuyen, P. T. M. (2023). Exploring the Efficacy of Summative Assessment to Promote the Continuous Improvement of Students’ English Proficiency. US-China Education Review B, 13(6), 346–357. https://doi.org/10.17265/2161-6248/2023.06.002

Zainina, K. A., Mufiqoh, M. Z., Aprilia, N., & Isnaeni, B. (2025). Rasch Model: Analysis of Biology Question Item in the Indonesia Independent Curriculum. Jurnal Penelitian Pendidikan IPA, 10(12), 10990–10998. https://doi.org/10.29303/jppipa.v10i12.7661

DOI: https://doi.org/10.31764/ijeca.v8i3.36276

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

IJECA (International Journal of Education and Curriculum Application) already indexed:

___________________________________________________________________

https://doi.org/10.31764/ijeca.

IJECA (International Journal of Education and Curriculum Application)
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

View IJECA Stats

____________________________________________________________________

IJECA Publisher Office:

Username
Password
Remember me