Comparative Analysis of Decision Tree and Random Forest Algorithms for Diabetes Prediction
Abstract
Diabetes Mellitus is a long-term medical disorder marked by high blood glucose levels that raise the risk of early mortality and organ failure. It has become an increasing global health problem, so making an accurate and timely diagnosis is urgently necessary. This study aims to diagnose people with diabetes mellitus by utilizing prediction techniques in data mining using experimental research. The prediction stage for diagnosing diabetes consists of four stages: dataset collection, data pre-processing, data processing, and evaluation. Data was obtained from Electronic Health Records (EHRs), namely the public "Diabetes Prediction Dataset". The pre-processing stage involves data filtering, attribute conversion, and class selection. The data processing utilizes random forests and decision tree models for diabetes prediction. The models were evaluated using accuracy, precision, and recall metrics. The results showed that the Random Forest algorithm produced an accuracy value of 93.97%, precision of 99.88%, and recall of 66.56%, with a computational time of 16s. Meanwhile, the decision tree algorithm produces an accuracy value of 93.89%, precision of 98.73%, and recall of 66.88%, with a computation time of less than 1s. Based on these results, it can be concluded that the Decision Tree algorithm is more effective because the difference in accuracy, precision, and recall values produced by the two algorithms does not have significant differences. However, the Decision Tree algorithm has the advantage of using computational time more effectively, which is needed in detecting diabetes because it is related to someone's life.
Keywords
Full Text:
DOWNLOAD [PDF]References
Aldahiri, A., Alrashed, B., & Hussain, W. (2021). Trends in Using IoT with Machine Learning in Health Prediction System. In Forecasting (Vol. 3, Issue 1, pp. 181–206). MDPI. https://doi.org/10.3390/forecast3010012
Beulens, J. W. J., Rutters, F., Rydén, L., Schnell, O., Mellbin, L., Hart, H. E., & Vos, R. C. (2019). Risk and management of pre-diabetes. European Journal of Preventive Cardiology, 26(2_suppl) 47–54. https://doi.org/10.1177/2047487319880041
Dumitrescu, E., Hué, S., Hurlin, C., & Tokpavi, S. (2022). Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. European Journal of Operational Research, 297(3), pp. 1–15. https://doi.org/10.1016/j.ejor.2021.06.053
Durugkar, S. R., Raja, R., Nagwanshi, K. K., & Kumar, S. (2022). Introduction to data mining. In Data Mining and Machine Learning Applications, vol. 4, 2022, pp. 1–19. https://doi.org/10.1002/9781119792529.ch1
Garg, M. (2023). Random Logistic Vector Analysis Based Opinion Mining For Identifying Best Product Using User Reviews in Ecommerce Applications. 2nd IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics, ICDCECE 2023. Ballar, India, 2023, pp. 1–6, https://doi.org/10.1109/ICDCECE57866.2023.10150493
Govindan, V., & Balakrishnan, V. (2022). A machine learning approach in analysing the effect of hyperboles using negative sentiment tweets for sarcasm detection. Journal of King Saud University - Computer and Information Sciences, 34(8), pp. 5110–5120. https://doi.org/10.1016/j.jksuci.2022.01.008
Hasnain, M., Pasha, M. F., Ghani, I., Imran, M., Alzahrani, M. Y., & Budiarto, R. (2020). Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking. IEEE Access, vol. 8, pp. 90847–90861, 2020 . https://doi.org/10.1109/ACCESS.2020.2994222
Heydarian, M., Doyle, T. E., & Samavi, R. (2022). MLCM: Multi-Label Confusion Matrix. IEEE Access, vol. 10, pp. 19083–19095, 2022. https://doi.org/10.1109/ACCESS.2022.3151048
Huh, Y., Han, K., Choi, M. J., Kim, J. H., Kim, S. M., & Nam, G. E. (2022). Association of Smoking Status With the Risk of Type 2 Diabetes Among Young Adults: A Nationwide Cohort Study in South Korea. Nicotine and Tobacco Research, 24(8), pp. 1234–1240. https://doi.org/10.1093/ntr/ntac044
Ismail, Mohmand, M. I., Hussain, H., Khan, A. A., Ullah, U., Zakarya, M., Ahmed, A., Raza, M., Rahman, I. U., & Haleem, M. (2022). A Machine Learning-Based Classification and Prediction Technique for DDoS Attacks. IEEE Access, vol. 10, no. 12, pp 21443–21454. https://doi.org/10.1109/ACCESS.2022.3152577
Khan, R. M. M., Chua, Z. J. Y., Tan, J. C., Yang, Y., Liao, Z., & Zhao, Y. (2019). From pre-diabetes to diabetes: Diagnosis, treatments and translational research. In Medicina (Lithuania) (Vol. 55, Issue 9, pp. 1–30). https://doi.org/10.3390/medicina55090546
Kumari, S., Vani, V., Malik, S., Tyagi, A. K., & Reddy, S. (2021). Analysis of Text Mining Tools in Disease Prediction. In Hybrid Intelligent Systems:20th International Conference on Hybrid Intelligent Systems (HIS 2020), December 14–16, 2020, 2021, pp. 546–564. Springer International Publishing. https://doi.org/10.1007/978-3-030-73050-5_55
Maji, S., & Arora, S. (2019). Decision Tree Algorithms for Prediction of Heart Disease. In Lecture Notes in Networks and Systems (Vol. 40, pp. 447–454). Springer. https://doi.org/10.1007/978-981-13-0586-3_45
Malakouti, S. M., Menhaj, M. B., & Suratgar, A. A. (2023). The usage of 10-fold cross-validation and grid search to enhance ML methods performance in solar farm power generation prediction. Cleaner Engineering and Technology, vol. 15, Art. no. 100664, 2023. Pp. 1-7, https://doi.org/10.1016/j.clet.2023.100664
Maulana, A., Faisal, F.A., Noviandy, T.R., Rizkia, T., Idroes, G.M., Tallei, T.E., El-Shazly, M., & Idroe, R. (2023). Machine Learning Approach for Diabetes Detection Using Fine-Tuned XGBoost Algorithm. Infolitika Journal of Data Science, vol. 1 (1), pp. 1-7, https://doi.org/10.60084/ijds.v1i1.72
Mehraeen, E., Pashaei, Z., Akhtaran, F. K., Dashti, M., Afzalian, A., Ghasemzadeh, A., Asili, P., Kahrizi, M. S., Mirahmad, M., Rahimi, E., Matini, P., Afsahi, A. M., Dadras, O., & Seyed Alinaghi, S. A. (2023). Estimating Methods of the Undetected Infections in the COVID-19 Outbreak: A Systematic Review. In Infectious Disorders - Drug Targets (Vol. 23, Issue 4, pp 1–20). https://doi.org/10.2174/1871526523666230124162103
Mohamed, E. S., Naqishbandi, T. A., Bukhari, S. A. C., Rauf, I., Sawrikar, V., & Hussain, A. (2023). A hybrid mental health prediction model using Support Vector Machine, Multilayer Perceptron, and Random Forest algorithms. Healthcare Analytics, Vol. 3, 100185, pp. 1–20. https://doi.org/10.1016/j.health.2023.100185
Moreno-Lumbreras, D., Gonzalez-Barahona, J. M., & Robles, G. (2023). BabiaXR: Facilitating experiments about XR data visualization. SoftwareX, Volume 24, 2023, 101587, pp. 1-8, ISSN 2352-7110. https://doi.org/10.1016/j.softx.2023.101587
Mubin, M. N., Kusuma, H., & Rivai, M. (2023). Perspective Transformation Automation In Identification Of Parking Lot Status With Blob Detection. JAREE (Journal on Advanced Research in Electrical Engineering), 7(2), pp. 84–91. https://doi.org/10.12962/jaree.v7i2.364
Pyayt, A., Khan, R., Brzozowski, R., Eswara, P., & Gubanov, M. (2020). Rapid Antibiotic Susceptibility Analysis Using Microscopy and Machine Learning. Proceedings - 2020 IEEE International Conference on Big Data, Big Data, 2020, pp. 5804-5806. https://doi.org/10.1109/BigData50022.2020.9378005
Qorib, M., Oladunni, T., Denis, M., Ososanya, E., & Cotae, P. (2023). Covid-19 vaccine hesitancy: Text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset. Expert Systems with Applications, vol. 212, 118715, pp. 1-14, 2023. https://doi.org/10.1016/j.eswa.2022.118715
Schulte, J., & Nissen, V. (2023). Sensitivity analysis of combinatorial optimization problems using evolutionary bilevel optimization and data mining. Annals of Mathematics and Artificial Intelligence, 91(2–3), pp. 309–328. https://doi.org/10.1007/s10472-022-09827-w
Segala, F. V., Papagni, R., Cotugno, S., De Vita, E., Susini, M. C., Filippi, V., Tulone, O., Facci, E., Lattanzio, R., Marotta, C., Manenti, F., Bavaro, D. F., De Iaco, G., Putoto, G., Veronese, N., Barbagallo, M., Saracino, A., & Di Gennaro, F. (2023). Stool Xpert MTB/RIF as a possible diagnostic alternative to sputum in Africa: a systematic review and meta-analysis. In Frontiers in Public Health, Vol. 11:1117709, pp. 1–9. https://doi.org/10.3389/fpubh.2023.1117709
Sękowski, K., Grudziąż-Sękowska, J., Pinkas, J., & Jankowski, M. (2022). Public knowledge and awareness of diabetes mellitus, its risk factors, complications, and prevention methods among adults in Poland—A 2022 nationwide cross-sectional survey. Frontiers in Public Health, 10: 1029358. pp. 1-28, https://doi.org/10.3389/fpubh.2022.1029358
Singh, B., & Jaiswal, R. (2022). Automation of prediction system for temporal data. International Journal of Information Technology (Singapore), 14(6), pp. 3165–3174. https://doi.org/10.1007/s41870-022-01065-x
Stoleru, G. I., & Iftene, A. (2022). Prediction of Medical Conditions Using Machine Learning Approaches: Alzheimer’s Case Study. Mathematics, 10(10), 1767, pp. 1–20. https://doi.org/10.3390/math10101767
Sun, D., Luo, R., Guo, Q., Xie, J., Liu, H., Lyu, S., Xue, X., Li, Z., & Song, S. (2023). A University Student Performance Prediction Model and Experiment Based on Multi-Feature Fusion and Attention Mechanism. IEEE Access, vol. 11, pp. 112307–112319, 2023. https://doi.org/10.1109/ACCESS.2023.3323365
Tangirala, S. (2020). Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. International Journal of Advanced Computer Science and Applications, 11(2), pp. 612–619. https://doi.org/10.14569/ijacsa.2020.0110277
Vives, L., Cabezas, I., Vives, J. C., Reyes, N. G., Aquino, J., Condor, J. B., & Altamirano, S. F. S. (2024). Prediction of Students’ Academic Performance in the Programming Fundamentals Course Using Long Short-Term Memory Neural Networks. IEEE Access, vol. 12, pp. 5882–5898, 2024. https://doi.org/10.1109/ACCESS.2024.3350169
Yilmaz, R., & Yağin, F. H. (2022). Early Detection of Coronary Heart Disease Based on Machine Learning Methods. Medical Records, 4(1), pp. 1–6. https://doi.org/10.37990/medr.1011924
DOI: https://doi.org/10.31764/jtam.v8i4.24388
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Aufar Faiq Fadhlullah, Triyanna Widiyaningtyas
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
_______________________________________________
JTAM already indexing:
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) |
_______________________________________________
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: