Comparative Analysis of Decision Tree and Random Forest Algorithms for Diabetes Prediction

Aufar Faiq Fadhlullah; Triyanna Widiyaningtyas

doi:10.31764/jtam.v8i4.24388

Comparative Analysis of Decision Tree and Random Forest Algorithms for Diabetes Prediction

Aufar Faiq Fadhlullah, Triyanna Widiyaningtyas

Abstract

Diabetes Mellitus is a long-term medical disorder marked by high blood glucose levels that raise the risk of early mortality and organ failure. It has become an increasing global health problem, so making an accurate and timely diagnosis is urgently necessary. This study aims to diagnose people with diabetes mellitus by utilizing prediction techniques in data mining using experimental research. The prediction stage for diagnosing diabetes consists of four stages: dataset collection, data pre-processing, data processing, and evaluation. Data was obtained from Electronic Health Records (EHRs), namely the public "Diabetes Prediction Dataset". The pre-processing stage involves data filtering, attribute conversion, and class selection. The data processing utilizes random forests and decision tree models for diabetes prediction. The models were evaluated using accuracy, precision, and recall metrics. The results showed that the Random Forest algorithm produced an accuracy value of 93.97%, precision of 99.88%, and recall of 66.56%, with a computational time of 16s. Meanwhile, the decision tree algorithm produces an accuracy value of 93.89%, precision of 98.73%, and recall of 66.88%, with a computation time of less than 1s. Based on these results, it can be concluded that the Decision Tree algorithm is more effective because the difference in accuracy, precision, and recall values produced by the two algorithms does not have significant differences. However, the Decision Tree algorithm has the advantage of using computational time more effectively, which is needed in detecting diabetes because it is related to someone's life.

Keywords

Diabetes Mellitus; Prediction Algorithm; Random Forest; Decision Tree.

Full Text:

DOWNLOAD [PDF]

References

Aldahiri, A., Alrashed, B., & Hussain, W. (2021). Trends in Using IoT with Machine Learning in Health Prediction System. In Forecasting (Vol. 3, Issue 1, pp. 181–206). MDPI. https://doi.org/10.3390/forecast3010012

Beulens, J. W. J., Rutters, F., Rydén, L., Schnell, O., Mellbin, L., Hart, H. E., & Vos, R. C. (2019). Risk and management of pre-diabetes. European Journal of Preventive Cardiology, 26(2_suppl) 47–54. https://doi.org/10.1177/2047487319880041

Dumitrescu, E., Hué, S., Hurlin, C., & Tokpavi, S. (2022). Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. European Journal of Operational Research, 297(3), pp. 1–15. https://doi.org/10.1016/j.ejor.2021.06.053

Durugkar, S. R., Raja, R., Nagwanshi, K. K., & Kumar, S. (2022). Introduction to data mining. In Data Mining and Machine Learning Applications, vol. 4, 2022, pp. 1–19. https://doi.org/10.1002/9781119792529.ch1

Garg, M. (2023). Random Logistic Vector Analysis Based Opinion Mining For Identifying Best Product Using User Reviews in Ecommerce Applications. 2nd IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics, ICDCECE 2023. Ballar, India, 2023, pp. 1–6, https://doi.org/10.1109/ICDCECE57866.2023.10150493

Govindan, V., & Balakrishnan, V. (2022). A machine learning approach in analysing the effect of hyperboles using negative sentiment tweets for sarcasm detection. Journal of King Saud University - Computer and Information Sciences, 34(8), pp. 5110–5120. https://doi.org/10.1016/j.jksuci.2022.01.008

Hasnain, M., Pasha, M. F., Ghani, I., Imran, M., Alzahrani, M. Y., & Budiarto, R. (2020). Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking. IEEE Access, vol. 8, pp. 90847–90861, 2020 . https://doi.org/10.1109/ACCESS.2020.2994222

Heydarian, M., Doyle, T. E., & Samavi, R. (2022). MLCM: Multi-Label Confusion Matrix. IEEE Access, vol. 10, pp. 19083–19095, 2022. https://doi.org/10.1109/ACCESS.2022.3151048

Huh, Y., Han, K., Choi, M. J., Kim, J. H., Kim, S. M., & Nam, G. E. (2022). Association of Smoking Status With the Risk of Type 2 Diabetes Among Young Adults: A Nationwide Cohort Study in South Korea. Nicotine and Tobacco Research, 24(8), pp. 1234–1240. https://doi.org/10.1093/ntr/ntac044

Ismail, Mohmand, M. I., Hussain, H., Khan, A. A., Ullah, U., Zakarya, M., Ahmed, A., Raza, M., Rahman, I. U., & Haleem, M. (2022). A Machine Learning-Based Classification and Prediction Technique for DDoS Attacks. IEEE Access, vol. 10, no. 12, pp 21443–21454. https://doi.org/10.1109/ACCESS.2022.3152577

Khan, R. M. M., Chua, Z. J. Y., Tan, J. C., Yang, Y., Liao, Z., & Zhao, Y. (2019). From pre-diabetes to diabetes: Diagnosis, treatments and translational research. In Medicina (Lithuania) (Vol. 55, Issue 9, pp. 1–30). https://doi.org/10.3390/medicina55090546

Kumari, S., Vani, V., Malik, S., Tyagi, A. K., & Reddy, S. (2021). Analysis of Text Mining Tools in Disease Prediction. In Hybrid Intelligent Systems:20th International Conference on Hybrid Intelligent Systems (HIS 2020), December 14–16, 2020, 2021, pp. 546–564. Springer International Publishing. https://doi.org/10.1007/978-3-030-73050-5_55

Maji, S., & Arora, S. (2019). Decision Tree Algorithms for Prediction of Heart Disease. In Lecture Notes in Networks and Systems (Vol. 40, pp. 447–454). Springer. https://doi.org/10.1007/978-981-13-0586-3_45

Malakouti, S. M., Menhaj, M. B., & Suratgar, A. A. (2023). The usage of 10-fold cross-validation and grid search to enhance ML methods performance in solar farm power generation prediction. Cleaner Engineering and Technology, vol. 15, Art. no. 100664, 2023. Pp. 1-7, https://doi.org/10.1016/j.clet.2023.100664

Maulana, A., Faisal, F.A., Noviandy, T.R., Rizkia, T., Idroes, G.M., Tallei, T.E., El-Shazly, M., & Idroe, R. (2023). Machine Learning Approach for Diabetes Detection Using Fine-Tuned XGBoost Algorithm. Infolitika Journal of Data Science, vol. 1 (1), pp. 1-7, https://doi.org/10.60084/ijds.v1i1.72

Mehraeen, E., Pashaei, Z., Akhtaran, F. K., Dashti, M., Afzalian, A., Ghasemzadeh, A., Asili, P., Kahrizi, M. S., Mirahmad, M., Rahimi, E., Matini, P., Afsahi, A. M., Dadras, O., & Seyed Alinaghi, S. A. (2023). Estimating Methods of the Undetected Infections in the COVID-19 Outbreak: A Systematic Review. In Infectious Disorders - Drug Targets (Vol. 23, Issue 4, pp 1–20). https://doi.org/10.2174/1871526523666230124162103

Mohamed, E. S., Naqishbandi, T. A., Bukhari, S. A. C., Rauf, I., Sawrikar, V., & Hussain, A. (2023). A hybrid mental health prediction model using Support Vector Machine, Multilayer Perceptron, and Random Forest algorithms. Healthcare Analytics, Vol. 3, 100185, pp. 1–20. https://doi.org/10.1016/j.health.2023.100185

Moreno-Lumbreras, D., Gonzalez-Barahona, J. M., & Robles, G. (2023). BabiaXR: Facilitating experiments about XR data visualization. SoftwareX, Volume 24, 2023, 101587, pp. 1-8, ISSN 2352-7110. https://doi.org/10.1016/j.softx.2023.101587

Mubin, M. N., Kusuma, H., & Rivai, M. (2023). Perspective Transformation Automation In Identification Of Parking Lot Status With Blob Detection. JAREE (Journal on Advanced Research in Electrical Engineering), 7(2), pp. 84–91. https://doi.org/10.12962/jaree.v7i2.364

Pyayt, A., Khan, R., Brzozowski, R., Eswara, P., & Gubanov, M. (2020). Rapid Antibiotic Susceptibility Analysis Using Microscopy and Machine Learning. Proceedings - 2020 IEEE International Conference on Big Data, Big Data, 2020, pp. 5804-5806. https://doi.org/10.1109/BigData50022.2020.9378005

Qorib, M., Oladunni, T., Denis, M., Ososanya, E., & Cotae, P. (2023). Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset. Expert Systems with Applications, vol. 212, 118715, pp. 1-14, 2023. https://doi.org/10.1016/j.eswa.2022.118715

Schulte, J., & Nissen, V. (2023). Sensitivity analysis of combinatorial optimization problems using evolutionary bilevel optimization and data mining. Annals of Mathematics and Artificial Intelligence, 91(2–3), pp. 309–328. https://doi.org/10.1007/s10472-022-09827-w

Segala, F. V., Papagni, R., Cotugno, S., De Vita, E., Susini, M. C., Filippi, V., Tulone, O., Facci, E., Lattanzio, R., Marotta, C., Manenti, F., Bavaro, D. F., De Iaco, G., Putoto, G., Veronese, N., Barbagallo, M., Saracino, A., & Di Gennaro, F. (2023). Stool Xpert MTB/RIF as a possible diagnostic alternative to sputum in Africa: a systematic review and meta-analysis. In Frontiers in Public Health, Vol. 11:1117709, pp. 1–9. https://doi.org/10.3389/fpubh.2023.1117709

Sękowski, K., Grudziąż-Sękowska, J., Pinkas, J., & Jankowski, M. (2022). Public knowledge and awareness of diabetes mellitus, its risk factors, complications, and prevention methods among adults in Poland—A 2022 nationwide cross-sectional survey. Frontiers in Public Health, 10: 1029358. pp. 1-28, https://doi.org/10.3389/fpubh.2022.1029358

Singh, B., & Jaiswal, R. (2022). Automation of prediction system for temporal data. International Journal of Information Technology (Singapore), 14(6), pp. 3165–3174. https://doi.org/10.1007/s41870-022-01065-x

Stoleru, G. I., & Iftene, A. (2022). Prediction of Medical Conditions Using Machine Learning Approaches: Alzheimer’s Case Study. Mathematics, 10(10), 1767, pp. 1–20. https://doi.org/10.3390/math10101767

Sun, D., Luo, R., Guo, Q., Xie, J., Liu, H., Lyu, S., Xue, X., Li, Z., & Song, S. (2023). A University Student Performance Prediction Model and Experiment Based on Multi-Feature Fusion and Attention Mechanism. IEEE Access, vol. 11, pp. 112307–112319, 2023. https://doi.org/10.1109/ACCESS.2023.3323365

Tangirala, S. (2020). Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. International Journal of Advanced Computer Science and Applications, 11(2), pp. 612–619. https://doi.org/10.14569/ijacsa.2020.0110277

Vives, L., Cabezas, I., Vives, J. C., Reyes, N. G., Aquino, J., Condor, J. B., & Altamirano, S. F. S. (2024). Prediction of Students’ Academic Performance in the Programming Fundamentals Course Using Long Short-Term Memory Neural Networks. IEEE Access, vol. 12, pp. 5882–5898, 2024. https://doi.org/10.1109/ACCESS.2024.3350169

Yilmaz, R., & Yağin, F. H. (2022). Early Detection of Coronary Heart Disease Based on Machine Learning Methods. Medical Records, 4(1), pp. 1–6. https://doi.org/10.37990/medr.1011924

DOI: https://doi.org/10.31764/jtam.v8i4.24388