CART Classification on Ordinal Scale Data with Unbalanced Proportions using Ensemble Bagging Approach

Luthfia Hanun Yuli Arini, Solimun Solimun, Achmad Efendi, Mohammad Ohid Ullah

Abstract


CART is one of the algorithms in data exploration techniques with decision tree techniques. Unbalanced class proportions in the classification process can cause classification results of minor data to be incorrect. One way to overcome the problem of data imbalance is to use an ensemble bagging algorithm. The bagging algorithm utilizes the resampling method to carry out classification so that it can reduce bias in imbalanced data. The data used is secondary data from Fernandes and Solimun's 2023 research report. The number of sample are 100 respondents that has been valid and reliable. The sample for this research was mothers with toddlers in Wajak village, Malang Regency. The results showed that the ensemble bagging CART method is better at overcoming the problem of imbalance in the proportion of classes with a performance value of accuracy, sensitivity, specificity, and F1-Score values of 85%, 94.1%, 66.7%, and 78%. This research is limited to the Sumberputih Village area. So, the results of this research are only representative for the Wajak District area.

 


Keywords


Unbalanced; CART; Bagging.

Full Text:

DOWNLOAD [PDF]

References


Altman, N., & Krzywinski, M. (2017). Ensemble methods: bagging and random forests. Nature Methods, 14(10), 933–934. https://doi.org/10.1038/nmeth.4438

Arrahimi, A. R., Ihsan, M. K., Kartini, D., Faisal, M. R., & Indriani, F. (2019). Teknik Bagging Dan Boosting Pada Algoritma CART Untuk Klasifikasi Masa Studi Mahasiswa. Jurnal Sains Dan Informatika, 5(1), 2598–5841. https://doi.org/10.34128/jsi.v5i1.171

Bramer, M. (2016). Principles of Data Mining (3rd ed.). Springer London. https://doi.org/10.1007/978-1-4471-7307-6

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification And Regression Trees. Routledge. https://doi.org/10.1201/9781315139470

Cendani, L. M., & Wibowo, A. (2022). Perbandingan Metode Ensemble Learning pada Klasifikasi Penyakit Diabetes. Jurnal Masyarakat Informatika, 13(1), 33–44. https://doi.org/10.14710/jmasif.13.1.42912

Chen, R.-C., Dewi, C., Huang, S.-W., & Caraka, R. E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 7(1), 52. https://doi.org/10.1186/s40537-020-00327-4

Chilyabanyama, O. N., Chilengi, R., Simuyandi, M., Chisenga, C. C., Chirwa, M., Hamusonde, K., Saroj, R. K., Iqbal, N. T., Ngaruye, I., & Bosomprah, S. (2022). Performance of Machine Learning Classifiers in Classifying Stunting among Under-Five Children in Zambia. Children, 9(7), 1082. https://doi.org/10.3390/children9071082

Daniya, T., Geetha, M., & Kumar, K. S. (2020). Classification and regression trees with gini index. Advances in Mathematics: Scientific Journal, 9(10), 8237–8247. https://doi.org/10.37418/amsj.9.10.53

De Prado, M. L. (2018). Advances in financial machine learning. John Wiley & Sons.

du Plooy, R., & Venter, P. J. (2021). A Comparison of Artificial Neural Networks and Bootstrap Aggregating Ensembles in a Modern Financial Derivative Pricing Framework. Journal of Risk and Financial Management, 14(6), 254. https://doi.org/10.3390/jrfm14060254

Efendi, A., Fitriani, R., Naufal, H. I., & Rahayudi, B. (2020). Ensemble Adaboost In Classification And Regression Trees To Overcome Class Imbalance In Credit Status Of Bank Customers. Journal of Theoretical and Applied Information Technology, 98(17), 3428–3437.

Erickson, B. J., & Kitamura, F. (2021). Magician’s Corner: 9. Performance Metrics for Machine Learning Models. Radiology: Artificial Intelligence, 3(3), e200126. https://doi.org/10.1148/ryai.2021200126

Fernandes, A. A. R., & Solimun. (2023). Menelisik Faktor-Faktor Penyebab Stunting pada Anak di Kecamatan Wajak: Integrasi Cluster dengan Path Analysis dengan Pendekatan Statistika dan Sains Data.

Fitriani, R. D., Yasin, H., & Tarno. (2021). Penanganan Klasifikasi Kelas Data Tidak Seimbang Dengan Random Oversampling Pada Naive Bayes (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal). Jurnal Gaussian, 10(1), 11–20.

Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics, 28(2). https://doi.org/10.1214/aos/1016218223

Jafarzadeh, H., Mahdianpari, M., Gill, E., Mohammadimanesh, F., & Homayouni, S. (2021). Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sensing, 13(21), 4405. https://doi.org/10.3390/rs13214405

Krzywinski, M., & Altman, N. (2017). Classification and regression trees. Nature Methods, 14(8), 757–758. https://doi.org/10.1038/nmeth.4370

Kumar, P., Bhatnagar, R., Gaur, K., & Bhatnagar, A. (2021). Classification of Imbalanced Data:Review of Methods and Applications. IOP Conference Series: Materials Science and Engineering, 1099(1), 012077. https://doi.org/10.1088/1757-899X/1099/1/012077

Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2, 40–46. https://doi.org/10.1016/j.ijcce.2021.01.001

Li, X., & Zhang, L. (2021). Unbalanced data processing using deep sparse learning technique. Future Generation Computer Systems, 125, 480–484. https://doi.org/10.1016/j.future.2021.05.034

Luque, A., Carrasco, A., Martín, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231. https://doi.org/10.1016/j.patcog.2019.02.023

Mahmudiono, T., Sumarmi, S., & Rosenkranz, R. R. (2017). Household dietary diversity and child stunting in East Java, Indonesia. Asia Pacific Journal of Clinical Nutrition, 26(2), 317–325. https://search.informit.org/doi/10.3316/ielapa.688058173877148

Ngo, G., Beard, R., & Chandra, R. (2022). Evolutionary bagging for ensemble learning. Neurocomputing, 510, 1–14. https://doi.org/10.1016/j.neucom.2022.08.055

Siahaan, D., Wahyuningsih, S., & Amijaya, F. D. T. (2017). Aplikasi Classification and Regression Tree (CART) dan Regresi Logistik Ordinal dalam Bidang Pendididikan. EKSPONENSIAL, 7(1), 95–104. https://jurnal.fmipa.unmul.ac.id/index.php/exponensial/article/view/46

Siringoringo, R., & Jaya, I. K. (2018). Ensemble Learning Dengan Metode Smotebagging Pada Klasifikasi Data Tidak Seimbang. Information System Development, 3(2), 75–81. https://ejournal-medan.uph.edu/index.php/isd/article/view/204

Wong, H. B., & Lim, G. H. (2011). Measures of Diagnostic Accuracy: Sensitivity, Specificity, PPV and NPV. Proceedings of Singapore Healthcare, 20(4), 316–318. https://doi.org/10.1177/201010581102000411

Zhao, C., Peng, R., & Wu, D. (2023). Bagging and Boosting Fine-tuning for Ensemble Learning. IEEE Transactions on Artificial Intelligence, 1–15. https://doi.org/10.1109/TAI.2023.3296685




DOI: https://doi.org/10.31764/jtam.v8i2.20201

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Luthfia Hanun Yuli Arini, Solimun, Achmad Efendi, Mohammad Ohid Ullah

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

_______________________________________________

JTAM already indexing:

                     


_______________________________________________

 

Creative Commons License

JTAM (Jurnal Teori dan Aplikasi Matematika) 
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

______________________________________________

_______________________________________________

_______________________________________________ 

JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: