Effectiveness of Machine Learning Models with Bayesian Optimization-Based Method to Identify Important Variables that Affect GPA

Arifuddin R; Utami Dyah Syafitri; Erfiani Erfiani

doi:10.31764/jtam.v8i3.21711

Effectiveness of Machine Learning Models with Bayesian Optimization-Based Method to Identify Important Variables that Affect GPA

Arifuddin R, Utami Dyah Syafitri, Erfiani Erfiani

Abstract

To produce superior human resources, the SPs-IPB Master Program must consider the factors influencing the GPA in the student selection process. The method that can be used to identify these factors is a machine learning algorithm. This paper applies the random forest and XGBoost algorithms to identify significant variables that affect GPA. In the evaluation process, the default model will be compared with the model resulting from Bayesian and random search optimization. Bayesian optimization is a method for optimizing hyperparameters that combines information from previous iterations to improve estimates. It is highly efficient in terms of computing time. Based on a balanced accuracy and sensitivity metrics average, Bayesian optimization produces a model superior to the default model and more time-efficient than random search optimization. XGBoost sensitivity metric is 25% better than random forest. However, random forest is 19% better in accuracy and 30% in specificity. Important variables are obtained from the information gain value when splitting the tree nodes formed. According to the best random forest and XGBoost model, variables that have the most influence on students' GPA are Undergraduate University Status (X8) and Undergraduate University (X6). Meanwhile, the variables with the smallest influence are Gender (X4) and Enrollment (X9).

Keywords

GPA; Bayesian Optimization; Random Forest; XGBoost

Full Text:

DOWNLOAD [PDF]

References

Aghaabbasi, M., Ali, M., Jasinski, M., Leonowicz, Z., & Novak, T. (2023). On Hyperparameter Optimization of Machine Learning Methods Using a Bayesian Optimization Algorithm to Predict Work Travel Mode Choice. IEEE Access, 11(January), 19762–19774. https://doi.org/10.1109/ACCESS.2023.3247448

Ahmed, S. A., & Khan, S. I. (2019). A machine learning approach to Predict the Engineering Students at risk of dropout and factors behind: Bangladesh Perspective. 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 1–6. https://doi.org/10.1109/ICCCNT45670.2019.8944511

Asselman, A., Khaldi, M., & Aammou, S. (2023). Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interactive Learning Environments, 31(6), 3360–3379. https://doi.org/10.1080/10494820.2021.1928235

Beckham, N. R., Akeh, L. J., Mitaart, G. N. P., & Moniaga, J. V. (2022). Determining factors that affect student performance using various machine learning methods. Procedia Computer Science, 216(2022), 597–603. https://doi.org/10.1016/j.procs.2022.12.174

Beltrán-Velasco, A. I., Donoso-González, M., & Clemente-Suárez, V. J. (2021). Analysis of perceptual, psychological, and behavioral factors that affect the academic performance of education university students. Physiology and Behavior, 238(June). https://doi.org/10.1016/j.physbeh.2021.113497

Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, 1–9. https://proceedings.neurips.cc/paper_files/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html

Breiman, L. (2001). Random Forests. In R. E. Schapire (Ed.), Machine Learning (Vol. 45, Issue 1, pp. 5–32). Kluwer Academic Publishers. https://doi.org/10.1023/A:1010933404324

Caraka, R. E., & Sugiarto, S. (2017). Path analysis of factors affecting student achievement. Jurnal Akuntabilitas Manajemen Pendidikan, 5(2), 212–219. https://doi.org/10.21831/amp.v5i2.10910

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique Nitesh. Journal of Artificial Intelligence Research, 16(Sept. 28), 321–357. https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-Augu, 785–794. https://doi.org/10.1145/2939672.2939785

Cheng, S. T., & Kaplowitz, S. A. (2016). Family economic status, cultural capital, and academic achievement: The case of Taiwan. International Journal of Educational Development, 49, 271–278. https://doi.org/10.1016/j.ijedudev.2016.04.002

Dewancker, I., McCourt, M., & Clark, S. (2016). Bayesian Optimization for Machine Learning : A Practical Guidebook. 1–15. http://arxiv.org/abs/1612.04858

Elreedy, D., & Atiya, A. F. (2019). A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32–64. https://doi.org/10.1016/j.ins.2019.07.070

Fahd, K., Venkatraman, S., Miah, S. J., & Ahmed, K. (2022). Application of machine learning in higher education to assess student academic performance, at-risk, and attrition: A meta-analysis of literature. Education and Information Technologies, 27(3), 3743–3775. https://doi.org/10.1007/s10639-021-10741-7

Garrido-Merchán, E. C., & Hernández-Lobato, D. (2020). Dealing with categorical and integer-valued variables in Bayesian Optimization with Gaussian processes. Neurocomputing, 380, 20–35. https://doi.org/10.1016/j.neucom.2019.11.004

Lorenzo, P. R., Nalepa, J., Kawulok, M., Ramos, L. S., & Pastor, J. R. (2017). Particle swarm optimization for hyper-parameter selection in deep neural networks. GECCO 2017 - Proceedings of the 2017 Genetic and Evolutionary Computation Conference, 481–488. https://doi.org/10.1145/3071178.3071208

Nohara, Y., Matsumoto, K., Soejima, H., & Nakashima, N. (2022). Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Computer Methods and Programs in Biomedicine, 214(February), 1–7. https://doi.org/10.1016/j.cmpb.2021.106584

Putri, S., Saefuddin, A., & Sartono, B. (2013). Identification of Affecting Factors on the GPA of First Year Students at Bogor Agricultural University Using Random Forest. Xplore: Journal of Statistics, 2(1), 1–5. http://repository.ipb.ac.id/handle/123456789/65917

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 4, 2951–2959. https://proceedings.neurips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html

Steinholtz, O. S. (2018). A Comparative Study of Black-box Optimization Algorithms for Tuning of Hyper-parameters in Deep Neural Networks. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-69865

Thölke, P., Mantilla-Ramos, Y. J., Abdelhedi, H., Maschke, C., Dehgan, A., Harel, Y., Kemtur, A., Mekki Berrada, L., Sahraoui, M., Young, T., Bellemare Pépin, A., El Khantour, C., Landry, M., Pascarella, A., Hadid, V., Combrisson, E., O’Byrne, J., & Jerbi, K. (2023). Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. NeuroImage, 277(April). https://doi.org/10.1016/j.neuroimage.2023.120253

Thornton, C., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Part F1288, 847–855. https://doi.org/10.1145/2487575.2487629

Wang, C., Deng, C., & Wang, S. (2020). Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recognition Letters, 136, 190–197. https://doi.org/10.1016/j.patrec.2020.05.035

Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1), 26–40. https://doi.org/10.11989/JEST.1674-862X.80904120

Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316. https://doi.org/10.1016/j.neucom.2020.07.061

Žalėnienė, I., & Pereira, P. (2021). Higher Education For Sustainability: A Global Perspective. Geography and Sustainability, 2(2), 99–106. https://doi.org/10.1016/j.geosus.2021.05.001

DOI: https://doi.org/10.31764/jtam.v8i3.21711