CART Classification on Ordinal Scale Data with Unbalanced Proportions using Ensemble Bagging Approach
Abstract
CART is one of the algorithms in data exploration techniques with decision tree techniques. Unbalanced class proportions in the classification process can cause classification results of minor data to be incorrect. One way to overcome the problem of data imbalance is to use an ensemble bagging algorithm. The bagging algorithm utilizes the resampling method to carry out classification so that it can reduce bias in imbalanced data. The data used is secondary data from Fernandes and Solimun's 2023 research report. The number of sample are 100 respondents that has been valid and reliable. The sample for this research was mothers with toddlers in Wajak village, Malang Regency. The results showed that the ensemble bagging CART method is better at overcoming the problem of imbalance in the proportion of classes with a performance value of accuracy, sensitivity, specificity, and F1-Score values of 85%, 94.1%, 66.7%, and 78%. This research is limited to the Sumberputih Village area. So, the results of this research are only representative for the Wajak District area.
Keywords
Full Text:
DOWNLOAD [PDF]References
Altman, N., & Krzywinski, M. (2017). Ensemble methods: bagging and random forests. Nature Methods, 14(10), 933–934. https://doi.org/10.1038/nmeth.4438
Arrahimi, A. R., Ihsan, M. K., Kartini, D., Faisal, M. R., & Indriani, F. (2019). Teknik Bagging Dan Boosting Pada Algoritma CART Untuk Klasifikasi Masa Studi Mahasiswa. Jurnal Sains Dan Informatika, 5(1), 2598–5841. https://doi.org/10.34128/jsi.v5i1.171
Bramer, M. (2016). Principles of Data Mining (3rd ed.). Springer London. https://doi.org/10.1007/978-1-4471-7307-6
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification And Regression Trees. Routledge. https://doi.org/10.1201/9781315139470
Cendani, L. M., & Wibowo, A. (2022). Perbandingan Metode Ensemble Learning pada Klasifikasi Penyakit Diabetes. Jurnal Masyarakat Informatika, 13(1), 33–44. https://doi.org/10.14710/jmasif.13.1.42912
Chen, R.-C., Dewi, C., Huang, S.-W., & Caraka, R. E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 7(1), 52. https://doi.org/10.1186/s40537-020-00327-4
Chilyabanyama, O. N., Chilengi, R., Simuyandi, M., Chisenga, C. C., Chirwa, M., Hamusonde, K., Saroj, R. K., Iqbal, N. T., Ngaruye, I., & Bosomprah, S. (2022). Performance of Machine Learning Classifiers in Classifying Stunting among Under-Five Children in Zambia. Children, 9(7), 1082. https://doi.org/10.3390/children9071082
Daniya, T., Geetha, M., & Kumar, K. S. (2020). Classification and regression trees with gini index. Advances in Mathematics: Scientific Journal, 9(10), 8237–8247. https://doi.org/10.37418/amsj.9.10.53
De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.
du Plooy, R., & Venter, P. J. (2021). A Comparison of Artificial Neural Networks and Bootstrap Aggregating Ensembles in a Modern Financial Derivative Pricing Framework. Journal of Risk and Financial Management, 14(6), 254. https://doi.org/10.3390/jrfm14060254
Efendi, A., Fitriani, R., Naufal, H. I., & Rahayudi, B. (2020). Ensemble Adaboost In Classification And Regression Trees To Overcome Class Imbalance In Credit Status Of Bank Customers. Journal of Theoretical and Applied Information Technology, 98(17), 3428–3437.
Erickson, B. J., & Kitamura, F. (2021). Magician’s Corner: 9. Performance Metrics for Machine Learning Models. Radiology: Artificial Intelligence, 3(3), e200126. https://doi.org/10.1148/ryai.2021200126
Fernandes, A. A. R., & Solimun. (2023). Menelisik Faktor-Faktor Penyebab Stunting pada Anak di Kecamatan Wajak: Integrasi Cluster dengan Path Analysis dengan Pendekatan Statistika dan Sains Data.
Fitriani, R. D., Yasin, H., & Tarno. (2021). Penanganan Klasifikasi Kelas Data Tidak Seimbang Dengan Random Oversampling Pada Naive Bayes (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal). Jurnal Gaussian, 10(1), 11–20.
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics, 28(2). https://doi.org/10.1214/aos/1016218223
Jafarzadeh, H., Mahdianpari, M., Gill, E., Mohammadimanesh, F., & Homayouni, S. (2021). Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sensing, 13(21), 4405. https://doi.org/10.3390/rs13214405
Krzywinski, M., & Altman, N. (2017). Classification and regression trees. Nature Methods, 14(8), 757–758. https://doi.org/10.1038/nmeth.4370
Kumar, P., Bhatnagar, R., Gaur, K., & Bhatnagar, A. (2021). Classification of Imbalanced Data:Review of Methods and Applications. IOP Conference Series: Materials Science and Engineering, 1099(1), 012077. https://doi.org/10.1088/1757-899X/1099/1/012077
Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2, 40–46. https://doi.org/10.1016/j.ijcce.2021.01.001
Li, X., & Zhang, L. (2021). Unbalanced data processing using deep sparse learning technique. Future Generation Computer Systems, 125, 480–484. https://doi.org/10.1016/j.future.2021.05.034
Luque, A., Carrasco, A., Martín, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231. https://doi.org/10.1016/j.patcog.2019.02.023
Mahmudiono, T., Sumarmi, S., & Rosenkranz, R. R. (2017). Household dietary diversity and child stunting in East Java, Indonesia. Asia Pacific Journal of Clinical Nutrition, 26(2), 317–325. https://search.informit.org/doi/10.3316/ielapa.688058173877148
Ngo, G., Beard, R., & Chandra, R. (2022). Evolutionary bagging for ensemble learning. Neurocomputing, 510, 1–14. https://doi.org/10.1016/j.neucom.2022.08.055
Siahaan, D., Wahyuningsih, S., & Amijaya, F. D. T. (2017). Aplikasi Classification and Regression Tree (CART) dan Regresi Logistik Ordinal dalam Bidang Pendididikan. EKSPONENSIAL, 7(1), 95–104. https://jurnal.fmipa.unmul.ac.id/index.php/exponensial/article/view/46
Siringoringo, R., & Jaya, I. K. (2018). Ensemble Learning Dengan Metode Smotebagging Pada Klasifikasi Data Tidak Seimbang. Information System Development, 3(2), 75–81. https://ejournal-medan.uph.edu/index.php/isd/article/view/204
Wong, H. B., & Lim, G. H. (2011). Measures of Diagnostic Accuracy: Sensitivity, Specificity, PPV and NPV. Proceedings of Singapore Healthcare, 20(4), 316–318. https://doi.org/10.1177/201010581102000411
Zhao, C., Peng, R., & Wu, D. (2023). Bagging and Boosting Fine-tuning for Ensemble Learning. IEEE Transactions on Artificial Intelligence, 1–15. https://doi.org/10.1109/TAI.2023.3296685
DOI: https://doi.org/10.31764/jtam.v8i2.20201
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Luthfia Hanun Yuli Arini, Solimun, Achmad Efendi, Mohammad Ohid Ullah
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
_______________________________________________
JTAM already indexing:
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) |
_______________________________________________
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: