Effectiveness of Machine Learning Models with Bayesian Optimization-Based Method to Identify Important Variables that Affect GPA
Abstract
To produce superior human resources, the SPs-IPB Master Program must consider the factors influencing the GPA in the student selection process. The method that can be used to identify these factors is a machine learning algorithm. This paper applies the random forest and XGBoost algorithms to identify significant variables that affect GPA. In the evaluation process, the default model will be compared with the model resulting from Bayesian and random search optimization. Bayesian optimization is a method for optimizing hyperparameters that combines information from previous iterations to improve estimates. It is highly efficient in terms of computing time. Based on a balanced accuracy and sensitivity metrics average, Bayesian optimization produces a model superior to the default model and more time-efficient than random search optimization. XGBoost sensitivity metric is 25% better than random forest. However, random forest is 19% better in accuracy and 30% in specificity. Important variables are obtained from the information gain value when splitting the tree nodes formed. According to the best random forest and XGBoost model, variables that have the most influence on students' GPA are Undergraduate University Status (X8) and Undergraduate University (X6). Meanwhile, the variables with the smallest influence are Gender (X4) and Enrollment (X9).
Keywords
Full Text:
DOWNLOAD [PDF]References
Aghaabbasi, M., Ali, M., Jasinski, M., Leonowicz, Z., & Novak, T. (2023). On Hyperparameter Optimization of Machine Learning Methods Using a Bayesian Optimization Algorithm to Predict Work Travel Mode Choice. IEEE Access, 11(January), 19762–19774. https://doi.org/10.1109/ACCESS.2023.3247448
Ahmed, S. A., & Khan, S. I. (2019). A machine learning approach to Predict the Engineering Students at risk of dropout and factors behind: Bangladesh Perspective. 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 1–6. https://doi.org/10.1109/ICCCNT45670.2019.8944511
Asselman, A., Khaldi, M., & Aammou, S. (2023). Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interactive Learning Environments, 31(6), 3360–3379. https://doi.org/10.1080/10494820.2021.1928235
Beckham, N. R., Akeh, L. J., Mitaart, G. N. P., & Moniaga, J. V. (2022). Determining factors that affect student performance using various machine learning methods. Procedia Computer Science, 216(2022), 597–603. https://doi.org/10.1016/j.procs.2022.12.174
Beltrán-Velasco, A. I., Donoso-González, M., & Clemente-Suárez, V. J. (2021). Analysis of perceptual, psychological, and behavioral factors that affect the academic performance of education university students. Physiology and Behavior, 238(June). https://doi.org/10.1016/j.physbeh.2021.113497
Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, 1–9. https://proceedings.neurips.cc/paper_files/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html
Breiman, L. (2001). Random Forests. In R. E. Schapire (Ed.), Machine Learning (Vol. 45, Issue 1, pp. 5–32). Kluwer Academic Publishers. https://doi.org/10.1023/A:1010933404324
Caraka, R. E., & Sugiarto, S. (2017). Path analysis of factors affecting student achievement. Jurnal Akuntabilitas Manajemen Pendidikan, 5(2), 212–219. https://doi.org/10.21831/amp.v5i2.10910
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique Nitesh. Journal of Artificial Intelligence Research, 16(Sept. 28), 321–357. https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-Augu, 785–794. https://doi.org/10.1145/2939672.2939785
Cheng, S. T., & Kaplowitz, S. A. (2016). Family economic status, cultural capital, and academic achievement: The case of Taiwan. International Journal of Educational Development, 49, 271–278. https://doi.org/10.1016/j.ijedudev.2016.04.002
Dewancker, I., McCourt, M., & Clark, S. (2016). Bayesian Optimization for Machine Learning : A Practical Guidebook. 1–15. http://arxiv.org/abs/1612.04858
Elreedy, D., & Atiya, A. F. (2019). A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32–64. https://doi.org/10.1016/j.ins.2019.07.070
Fahd, K., Venkatraman, S., Miah, S. J., & Ahmed, K. (2022). Application of machine learning in higher education to assess student academic performance, at-risk, and attrition: A meta-analysis of literature. Education and Information Technologies, 27(3), 3743–3775. https://doi.org/10.1007/s10639-021-10741-7
Garrido-Merchán, E. C., & Hernández-Lobato, D. (2020). Dealing with categorical and integer-valued variables in Bayesian Optimization with Gaussian processes. Neurocomputing, 380, 20–35. https://doi.org/10.1016/j.neucom.2019.11.004
Lorenzo, P. R., Nalepa, J., Kawulok, M., Ramos, L. S., & Pastor, J. R. (2017). Particle swarm optimization for hyper-parameter selection in deep neural networks. GECCO 2017 - Proceedings of the 2017 Genetic and Evolutionary Computation Conference, 481–488. https://doi.org/10.1145/3071178.3071208
Nohara, Y., Matsumoto, K., Soejima, H., & Nakashima, N. (2022). Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Computer Methods and Programs in Biomedicine, 214(February), 1–7. https://doi.org/10.1016/j.cmpb.2021.106584
Putri, S., Saefuddin, A., & Sartono, B. (2013). Identification of Affecting Factors on the GPA of First Year Students at Bogor Agricultural University Using Random Forest. Xplore: Journal of Statistics, 2(1), 1–5. http://repository.ipb.ac.id/handle/123456789/65917
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 4, 2951–2959. https://proceedings.neurips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html
Steinholtz, O. S. (2018). A Comparative Study of Black-box Optimization Algorithms for Tuning of Hyper-parameters in Deep Neural Networks. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-69865
Thölke, P., Mantilla-Ramos, Y. J., Abdelhedi, H., Maschke, C., Dehgan, A., Harel, Y., Kemtur, A., Mekki Berrada, L., Sahraoui, M., Young, T., Bellemare Pépin, A., El Khantour, C., Landry, M., Pascarella, A., Hadid, V., Combrisson, E., O’Byrne, J., & Jerbi, K. (2023). Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. NeuroImage, 277(April). https://doi.org/10.1016/j.neuroimage.2023.120253
Thornton, C., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Part F1288, 847–855. https://doi.org/10.1145/2487575.2487629
Wang, C., Deng, C., & Wang, S. (2020). Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recognition Letters, 136, 190–197. https://doi.org/10.1016/j.patrec.2020.05.035
Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1), 26–40. https://doi.org/10.11989/JEST.1674-862X.80904120
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316. https://doi.org/10.1016/j.neucom.2020.07.061
Žalėnienė, I., & Pereira, P. (2021). Higher Education For Sustainability: A Global Perspective. Geography and Sustainability, 2(2), 99–106. https://doi.org/10.1016/j.geosus.2021.05.001
DOI: https://doi.org/10.31764/jtam.v8i3.21711
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Arifuddin R, Utami Dyah Syafitri, Erfiani
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
_______________________________________________
JTAM already indexing:
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) |
_______________________________________________
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: