Multilevel Semiparametric Modeling with Overdispersion and Excess Zeros on School Dropout Rates in Indonesia

Arna Ristiyanti Tarida, Anik Djuraidah, Agus Mohamad Soleh

Abstract


This study aims to identify key factors influencing high school dropout rates in Indonesia by applying advanced statistical modeling that accounts for complex data characteristics. Dropout data often display overdispersion (variability greater than expected) and excess zeros (many students not dropping out), which, if ignored, can bias conclusions.  To address this, we compare parametric models, Zero-Inflated Poisson Mixed Model (ZIPMM), Zero-Inflated Generalized Poisson Mixed Model (ZIGPMM), and Zero-Inflated Negative Binomial Mixed Model (ZINBMM), with their semiparametric counterparts (SZIPMM, SZIGPMM, SZINBMM). The semiparametric models use B-spline functions to capture nonlinear relationships between predictors and dropout rates, with flexibility. Model performance was evaluated using Akaike Information Criterion (AIC) and Root Mean Square Error (RMSE) across 100 simulation repetitions to ensure robustness. Results show that the semiparametric ZIGPMM (SZIGPMM) outperformed other models, achieving the lowest average AIC (18969.62), suggesting the best trade-off between model fit and complexity. The optimal spline configuration used knot point 2 and order 3, with a Generalized Cross-Validation (GCV) score of 9.4107. Key predictors of dropout include school status (public or private), student-teacher ratio, distance from home to school, parental education level, parental employment status, and number of siblings. These findings provide actionable insights for education policymakers, emphasizing the need to address structural and socioeconomic barriers to reduce dropout rates effectively.

Keywords


Semiparametric multilevel; Overdispersion; Excess zeros; B-Spline; School dropout rates.

Full Text:

DOWNLOAD [PDF]

References


Agresti, A. (2013). Categorical Data Analysis (3rd ed.). John Wiley & Sons, Inc. https://doi.org/10.1002/0470114754

Agresti, A. (2015). Foundations of Linear and Generalized Linear Models Wiley Series in Probability and Statistics (3rd ed.). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118730034

Almasi, A., Eshraghian, M. R., Moghimbeigi, A., Rahimi, A., Mohammad, K., & Fallahigilan, S. (2016). Multilevel zero-inflated Generalized Poisson regression modeling for dispersed correlated count data. Statistical Methodology, 30, 1–14. https://doi.org/10.1016/j.stamet.2015.11.001

Aráujo, E. G., Vasconcelos, J. C. S., dos Santos, D. P., Ortega, E. M. M., de Souza, D., & Zanetoni, J. P. F. (2023). The Zero-Inflated Negative Binomial Semiparametric Regression Model: Application to Number of Failing Grades Data. Annals of Data Science, 10(4), 991–1006. https://doi.org/10.1007/s40745-021-00350-z

Beccari, C. V., & Casciola, G. (2021). Stable numerical evaluation of multi-degree B-splines. http://arxiv.org/abs/2102.03252

Belloc, F., Maruotti, A., & Petrella, L. (2011). How individual characteristics affect university students drop-out: A semiparametric mixed-effects model for an Italian case study. Journal of Applied Statistics, 38(10), 2225–2239. https://doi.org/10.1080/02664763.2010.545373

Chudy, F., & Woźny, P. (2022). Linear-time algorithm for computing the Bernstein-B’{e}zier coefficients of B-spline basis functions. http://arxiv.org/abs/2204.05002

Dean, C. B., & Lundy, E. R. (2016). Overdispersion. Wiley StatsRef: Statistics Reference Online, 1–9. https://doi.org/10.1002/9781118445112.stat06788.pub2

Fernandez, G. A., & Vatcheva, K. P. (2022). A comparison of statistical methods for modeling count data with an application to hospital length of stay. BMC Medical Research Methodology, 22(1), 1–21. https://doi.org/10.1186/s12874-022-01685-8

Gbaguidi, V. E., & Adetou, D. (2024). Factors affecting school dropout: Comparative study of rural and urban settings. International Journal of Educational Management and Development Studies, 5(2), 233–256. https://doi.org/10.53378/353073

Hardin, J. W., & Hilbe, J. M. (2018). Generalized linear models and extensions. Stata Press. https://www.stata-press.com/books/generalized-linear-models-and-extensions/

Hilbe, J. M. (2011). Negative Binomial Regression (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511973420

Jiang, J., & Nguyen, T. (2021). Linear and Generalized Linear Mixed Models and Their Applications (2nd ed.). Springer. http://www.springer.com/series/692

[MoECRT] Ministry of Education, Culture, Research and Technology. Statistik Sekolah Menengah Atas Tahun 2022-2023. https://data.kemdikbud.go.id/publikasi/p/pauddasmen-buku-statistik/statistik-sekolah-menengah-atas-sma-tahun-2022-2023

Ole Kinisa, G. R. (2019). Effectiveness of Educational Policy in Curbing School Dropout in Secondary Schools in Tanzania: A Case of Dodoma City. International Journal of Scientific and Research Publications (IJSRP), 9(5), 129–159. https://doi.org/10.29322/ijsrp.9.05.2019.p8916

Mahmoodi, M., Moghimbeigi, A., Mohammad, K., & Faradmal, J. (2016). Semiparametric models for multilevel overdispersed count data with extra zeros. Statistical Methods in Medical Research, 27(4), 1–15. https://doi.org/10.1177/0962280216657376

Mahmoud, H. F. F. (2021). Parametric Versus Semi and Nonparametric Regression Models. International Journal of Statistics and Probability, 10(2), 90–109. https://doi.org/10.5539/ijsp.v10n2p90

Masci, C., Ieva, F., & Paganoni, A. M. (2022). Semiparametric Multinomial Mixed-Effects Models: A University Students Profiling Tool. Annals of Applied Statistics, 16(3), 1608–1632. https://doi.org/10.1214/21-AOAS1559

Mubarokah, L., Budiantara, I. N., & Ratna, M. (2016). Pemodelan Angka Putus Sekolah Usia SMP Menggunakan Metode Regresi Nonparametrik Spline di Papua. Jurnal Sains Dan Seni ITS, 5(1), 2337–3520. https://ejurnal.its.ac.id

Myers, R. H., Montgomery, D. C., Vining, G. G., & Robinson, T. J. (2010). Generalized Linear Models: With Applications in Engineering and the Sciences (2nd ed.). John Wiley & Sons, Inc. https://doi.org/10.1002/9780470556986

Perperoglou, A., Sauerbrei, W., Abrahamowicz, M., & Schmid, M. (2019). A review of spline function procedures in R. In BMC Medical Research Methodology (Vol. 19, Issue 1, pp. 1–16). BioMed Central Ltd. https://doi.org/10.1186/s12874-019-0666-3

Pramaningrum, D. S., Fernandes, A. A. R., Iriany, A., & Solimun, S. (2024). The Application of Truncated Spline Semiparametric Path Analysis on Determining Factors Influencing Cashless Society Development. JTAM (Jurnal Teori Dan Aplikasi Matematika), 8(2), 400–410. https://doi.org/10.31764/jtam.v8i2.19913

Rahma, R. A., & Arcana, I. M. (2019). Risk Level of Dropping Out of School for Adolescent in Papua Province 2018. Seminar Nasional Official Statistics, 672–681. https://doi.org/10.34123/semnasoffstat.v2020i1.468

Ruíz, J. S., Montesinos López, O. A., Ramírez, G. H., & Hiriart, J. C. (2023). Generalized Linear Mixed Models with Applications in Agriculture and Biology. Springer. https://doi.org/10.1007/978-3-031-32800-8

Law Number 23 of 2014 on Regional Government. (2014). https://peraturan.bpk.go.id/Details/38685/uu-no-23-tahun-2014

UNESCO. (2025). Sustainable Development Goal 4. https://www.unesco.org/sdg4education2030/en/sdg4

Utami, T. W., Chamidah, N., & Saifudin, T. (2024). Platelet Modeling in DHF Patients Using Local Polynomial Semiparametric Regression on Longitudinal Data. JTAM (Jurnal Teori Dan Aplikasi Matematika), 8(1), 231–243. https://doi.org/10.31764/jtam.v8i1.17427




DOI: https://doi.org/10.31764/jtam.v9i3.30102

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Arna Ristiyanti Tarida, Anik Djuraidah, Agus Mohamad Soleh

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

_______________________________________________

JTAM already indexing:

                     


_______________________________________________

 

Creative Commons License

JTAM (Jurnal Teori dan Aplikasi Matematika) 
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

______________________________________________

_______________________________________________

_______________________________________________ 

JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: