Prediction of Maternity Recovery Rate of Group Long-Term Disability Insurance Using XGBoost
Abstract
To help insurers determine insurance rates incorporating maternity factors, it is crucial to understand the maternity recovery rate, which was a metric used by insurance companies to understand how much of the expenses associated with maternity care and related medical services are covered by their policies. This paper employed Extreme Gradient Boosting (XGBoost), a powerful method for handling complex data relationships and preventing overfitting, on North American Group Long-Term Disability dataset obtained from the Society of Actuaries, which listed maternity as one of its categories, to predict the maternity recovery rate. In comparison, other machine learning methods such as Gradient Boosting Machine (GBM) and Bayesian Additive Regression Tree (BART) were used, with Root Mean Squared Error (RMSE) values calculated the difference between predicted and observed maternity recovery rates. Four datasets, 3 imbalanced and 1 fairly-balanced, were created out of the original dataset to test each method’s predictive prowess. The study revealed that XGBoost performed exceptionally well on the imbalanced datasets, while BART showed slight superiority in fairly-balanced data. Furthermore, the model identified the duration, exposures, and age of participants in both predicting maternity recovery rates and the underwriting process.
Keywords
Full Text:
DOWNLOAD [PDF]References
AbouZahr, C. (2003). Global Burden of Maternal Death and Disability. British Medical Bulletin, 67(1), 1–11. https://doi.org/10.1093/bmb/ldg015
Binson, V. A.; Subramoniam, M.; Sunny, Y.; Mathew, L. (2021). Prediction of Pulmonary Diseases with Electronic Nose Using SVM and XGBoost. IEEE Sensors Journal, 21(18), 20886–20895. https://doi.org/10.1109/JSEN.2021.3100390
Budiana, S.; Kusnadi, F.; Irawan, R. (2023). Bayesian Additive Regression Tree Application for Predicting Maternity Recovery Rate of Group Long-Term Disability Insurance. Barekeng: Jurnal Ilmu Matematika Dan Terapan, 17(1), 135–146. https://doi.org/10.30598/barekengvol17iss1pp0135-0146
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
Contreary, K.; Ben-Shalom, Y.; Gifford, B. (2018). Using Predictive Analytics for Early Identification of Short-Term Disability Claimants who Exhaust Their Benefits. Journal of Occupational Rehabilitation, 28, 584–596. https://doi.org/10.1007/s10926-018-9815-5
Demir, S.; Şahin, E. K. (2022). Liquefaction Prediction with Robust Machine Learning Algorithms (SVM, RF, and XGBoost) Supported by Genetic Algorithm-Based Feature Selection and Parameter Optimization from the Perspective of Data Processing. Environmental Earth Sciences, 81(18), 459. https://doi.org/10.1007/s12665-022-10578-4
Deshpande, M.; Lockwood, L. M. (2022). Beyond Health: Nonhealth Risk and the Value of Disability Insurance. Econometrica, 90(4), 1781–1810. https://doi.org/10.3982/ECTA19668
Fong, J. H.; Shao, A. W.; Sherris, M. (2015). Multistate Actuarial Models of Functional Disability. North American Actuarial Journal, 19(1), 41–59. https://doi.org/10.1080/10920277.2014.978025
Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29(5), 1189–1232. https://www.jstor.org/stable/2699986
Friedman, J. H. (2002). Stochastic Gradient Boosting. Computational Statistics & Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
Haberman, S.; Pitacco, E. (2018). Actuarial Models for Disability Insurance.
Hassan, A. K. I.; Abraham, A. (2016). Modeling Insurance Fraud Detection Using Imbalanced Data Classification. Advances in Nature and Biologically Inspired Computing: Proceedings of the 7th World Congress on Nature and Biologically Inspired Computing (NaBIC2015), 117–127. https://doi.org/10.1007/978-3-319-27400-3_11
Jothi, N.; Husain, W. (2015). Data Mining in Healthcare–A Review. Procedia Computer Science, 72, 306–313. https://doi.org/10.1016/j.procs.2015.12.145
Kopinsky, M. (2017). Predicting Group Long Term Disability Recovery and Mortality Rates using Tree Models. In Society of Actuaries. https://www.soa.org/globalassets/assets/Files/Research/Projects/2017-gltd-recovery-mortality-tree.pdf
Krawczyk, B.; Woźniak, M.; Schaefer, G. (2014). Cost-Sensitive Decision Tree Ensembles for Effective Imbalanced Classification. Applied Soft Computing, 14, 554–562. https://doi.org/10.1016/j.asoc.2013.08.014
Li, H.; Cao, Y.; Li, S.; Zhao, J.; Sun, Y. (2020). XGBoost Model and Its Application to Personal Credit Evaluation. IEEE Intelligent Systems, 35(3), 52–61. https://doi.org/10.1109/MIS.2020.2972533
Liebman, J. B. (2015). Understanding the Increase in Disability Insurance Benefit Receipt in the United States. Journal of Economic Perspectives, 29(2), 123–150. https://doi.org/10.1257/jep.29.2.123
Liu, J.; Xu, K.; Cai, B.; Guo, Z. (2023). Fault Prediction of On-Board Train Control Equipment Using a CGAN-Enhanced XGBoost Method with Unbalanced Samples. Machines, 11(1), 114. https://doi.org/10.3390/machines11010114
London, R. L. (1982). An Overview of Actuarial Decrement Rate Estimation. Actuarial Research Conference of the Society of Actuaries, 17, 1–10. https://www.soa.org/globalassets/assets/library/research/actuarial-research-clearing-house/1978-89/1983/arch-2/arch83v23.pdf
Melkumova, L. E.; Shatskikh, S. Y. (2017). Comparing Ridge and LASSO Estimators for Data Analysis. Procedia Engineering, 201, 746–755. https://doi.org/10.1016/j.proeng.2017.09.615
Muslim, M.A.; Dasril, Y. (2021). Company Bankruptcy Prediction Framework Based on the Most Influential Features Using XGBoost and Stacking Ensemble Learning. International Journal of Electrical and Computer Engineering (IJECE), 11(6), 5549–5557. https://doi.org/10.11591/ijece.v11i6.pp5549-5557
Ogunleye, A.; Wang, Q. G. (2019). XGBoost Model for Chronic Kidney Disease Diagnosis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(6), 2131–2140. https://doi.org/10.1109/TCBB.2019.2911071
Quan, Z.; Valdez, E. A. (2018). Predictive Analytics of Insurance Claims Using Multivariate Decision Trees. Dependence Modeling, 6(1), 377–407. https://doi.org/10.1515/demo-2018-0022
Sahin, E. K. (2020). Assessing the Predictive Capability of Ensemble Tree Methods for Landslide Susceptibility Mapping Using XGBoost, Gradient Boosting Machine, and Random Forest. SN Applied Sciences, 2(7), 1308. https://doi.org/10.1007/s42452-020-3060-1
Sahin, E. K. (2023). Implementation of Free and Open-Source Semi-Automatic Feature Engineering Tool in Landslide Susceptibility Mapping Using the Machine-Learning Algorithms RF, SVM, and XGBoost. Stochastic Environmental Research and Risk Assessment, 37(3), 1067–1092. https://doi.org/10.1007/s00477-022-02330-y
Selamat, N.A.; Abdullah, A.; Diah, N. M. (2022). Association Features of SMOTE and ROSE for Drug Addiction Relapse Risk. Journal of King Saud University - Computer and Information Sciences, 34(9), 7710–7719. https://doi.org/10.1016/j.jksuci.2022.06.012
Shen, Y. (2005). Loss Functions for Binary Classification and Class Probability Estimation. https://www.proquest.com/openview/ff8caed03c746ebca2d686ec5b385710/1?pq-origsite=gscholar&cbl=18750&diss=y
Singhal, Y.; Jain, A.; Batra, S.; Varshney, Y.; Rathi, M. (2018). Review of Bagging and Boosting Classification Performance on Unbalanced Binary Classification. IEEE 8th International Advance Computing Conference (IACC), 338–343. https://doi.org/10.1109/IADCC.2018.8692138
Wang, W.; Lu, Y. (2018). Analysis of the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) in Assessing Rounding Model. IOP Conference Series: Materials Science and Engineering, 324, 012049. https://doi.org/10.1088/1757-899X/324/1/012049
Wang, X.; Fu, D.; Wang, Y.; Guo, Y.; Ding, Y. (2021). The XGBoost and the SVM-Based Prediction Models for Bioretention Cell Decontamination Effect. Arabian Journal of Geosciences, 14, 1–11. https://doi.org/10.1007/s12517-021-07013-6
Wei, P.; Lu, Z.; Song, J. (2015). Variable Importance Analysis: A Comprehensive Review. Reliability Engineering & System Safety, 142, 399–432. https://doi.org/10.1016/j.ress.2015.05.018
White, R. S.; Lui, B.; Bryant-Huppert, J.; Chaturvedi, R.; Hoyler, M.; Aaronson, J. (2022). Economic Burden of Maternal Mortality in the USA. Journal of Comparative Effectiveness Research, 11(13), 927–933. https://doi.org/10.2217/cer-2022-0056
William, J.; Chojenta, C.; Martin, M. A.; Loxton, D. (2019). An Actuarial Investigation Into Maternal Out-of-Hospital Cost Risk Factors. Annals of Actuarial Science, 13(1), 1–35. https://doi.org/10.1017/S1748499518000015
William, J.; Martin, M. A.; Chojenta, C.; Loxton, D. (2018). An Actuarial Investigation Into Maternal Hospital Cost Risk Factors for Public Patients. Annals of Actuarial Science, 12(1), 106–129. https://doi.org/10.1017/S174849951700015X
Zhang, Y.; Wang, J.; Liang, B.; Wu, H.; Chen, Y. (2023). Diagnosis of Malignant Pleural Effusion with Combinations of Multiple Tumor Markers: A Comparison Study of Five Machine Learning Models. The International Journal of Biological Markers, 38(2), 03936155231158125. https://doi.org/10.1177/03936155231158125
DOI: https://doi.org/10.31764/jtam.v7i4.16825
Refbacks
- There are currently no refbacks.
Copyright (c) 2023 Felivia Kusnadi, Andry Wijaya, Julius Dharma Lesmono
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
_______________________________________________
JTAM already indexing:
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) |
_______________________________________________
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: