Prediction of Maternity Recovery Rate of Group Long-Term Disability Insurance Using XGBoost

Felivia Kusnadi, Andry Wijaya, Julius Dharma Lesmono

Abstract


To help insurers determine insurance rates incorporating maternity factors, it is crucial to understand the maternity recovery rate, which was a metric used by insurance companies to understand how much of the expenses associated with maternity care and related medical services are covered by their policies. This paper employed Extreme Gradient Boosting (XGBoost), a powerful method for handling complex data relationships and preventing overfitting, on North American Group Long-Term Disability dataset obtained from the Society of Actuaries, which listed maternity as one of its categories, to predict the maternity recovery rate. In comparison, other machine learning methods such as Gradient Boosting Machine (GBM) and Bayesian Additive Regression Tree (BART) were used, with Root Mean Squared Error (RMSE) values calculated the difference between predicted and observed maternity recovery rates. Four datasets, 3 imbalanced and 1 fairly-balanced, were created out of the original dataset to test each method’s predictive prowess. The study revealed that XGBoost performed exceptionally well on the imbalanced datasets, while BART showed slight superiority in fairly-balanced data. Furthermore, the model identified the duration, exposures, and age of participants in both predicting maternity recovery rates and the underwriting process.

 


Keywords


Imbalanced Data; Maternity Recovery Rate; XGBoost; Variable Importance.

Full Text:

DOWNLOAD [PDF]

References


AbouZahr, C. (2003). Global Burden of Maternal Death and Disability. British Medical Bulletin, 67(1), 1–11. https://doi.org/10.1093/bmb/ldg015

Binson, V. A.; Subramoniam, M.; Sunny, Y.; Mathew, L. (2021). Prediction of Pulmonary Diseases with Electronic Nose Using SVM and XGBoost. IEEE Sensors Journal, 21(18), 20886–20895. https://doi.org/10.1109/JSEN.2021.3100390

Budiana, S.; Kusnadi, F.; Irawan, R. (2023). Bayesian Additive Regression Tree Application for Predicting Maternity Recovery Rate of Group Long-Term Disability Insurance. Barekeng: Jurnal Ilmu Matematika Dan Terapan, 17(1), 135–146. https://doi.org/10.30598/barekengvol17iss1pp0135-0146

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

Contreary, K.; Ben-Shalom, Y.; Gifford, B. (2018). Using Predictive Analytics for Early Identification of Short-Term Disability Claimants who Exhaust Their Benefits. Journal of Occupational Rehabilitation, 28, 584–596. https://doi.org/10.1007/s10926-018-9815-5

Demir, S.; Şahin, E. K. (2022). Liquefaction Prediction with Robust Machine Learning Algorithms (SVM, RF, and XGBoost) Supported by Genetic Algorithm-Based Feature Selection and Parameter Optimization from the Perspective of Data Processing. Environmental Earth Sciences, 81(18), 459. https://doi.org/10.1007/s12665-022-10578-4

Deshpande, M.; Lockwood, L. M. (2022). Beyond Health: Nonhealth Risk and the Value of Disability Insurance. Econometrica, 90(4), 1781–1810. https://doi.org/10.3982/ECTA19668

Fong, J. H.; Shao, A. W.; Sherris, M. (2015). Multistate Actuarial Models of Functional Disability. North American Actuarial Journal, 19(1), 41–59. https://doi.org/10.1080/10920277.2014.978025

Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29(5), 1189–1232. https://www.jstor.org/stable/2699986

Friedman, J. H. (2002). Stochastic Gradient Boosting. Computational Statistics & Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2

Haberman, S.; Pitacco, E. (2018). Actuarial Models for Disability Insurance.

Hassan, A. K. I.; Abraham, A. (2016). Modeling Insurance Fraud Detection Using Imbalanced Data Classification. Advances in Nature and Biologically Inspired Computing: Proceedings of the 7th World Congress on Nature and Biologically Inspired Computing (NaBIC2015), 117–127. https://doi.org/10.1007/978-3-319-27400-3_11

Jothi, N.; Husain, W. (2015). Data Mining in Healthcare–A Review. Procedia Computer Science, 72, 306–313. https://doi.org/10.1016/j.procs.2015.12.145

Kopinsky, M. (2017). Predicting Group Long Term Disability Recovery and Mortality Rates using Tree Models. In Society of Actuaries. https://www.soa.org/globalassets/assets/Files/Research/Projects/2017-gltd-recovery-mortality-tree.pdf

Krawczyk, B.; Woźniak, M.; Schaefer, G. (2014). Cost-Sensitive Decision Tree Ensembles for Effective Imbalanced Classification. Applied Soft Computing, 14, 554–562. https://doi.org/10.1016/j.asoc.2013.08.014

Li, H.; Cao, Y.; Li, S.; Zhao, J.; Sun, Y. (2020). XGBoost Model and Its Application to Personal Credit Evaluation. IEEE Intelligent Systems, 35(3), 52–61. https://doi.org/10.1109/MIS.2020.2972533

Liebman, J. B. (2015). Understanding the Increase in Disability Insurance Benefit Receipt in the United States. Journal of Economic Perspectives, 29(2), 123–150. https://doi.org/10.1257/jep.29.2.123

Liu, J.; Xu, K.; Cai, B.; Guo, Z. (2023). Fault Prediction of On-Board Train Control Equipment Using a CGAN-Enhanced XGBoost Method with Unbalanced Samples. Machines, 11(1), 114. https://doi.org/10.3390/machines11010114

London, R. L. (1982). An Overview of Actuarial Decrement Rate Estimation. Actuarial Research Conference of the Society of Actuaries, 17, 1–10. https://www.soa.org/globalassets/assets/library/research/actuarial-research-clearing-house/1978-89/1983/arch-2/arch83v23.pdf

Melkumova, L. E.; Shatskikh, S. Y. (2017). Comparing Ridge and LASSO Estimators for Data Analysis. Procedia Engineering, 201, 746–755. https://doi.org/10.1016/j.proeng.2017.09.615

Muslim, M.A.; Dasril, Y. (2021). Company Bankruptcy Prediction Framework Based on the Most Influential Features Using XGBoost and Stacking Ensemble Learning. International Journal of Electrical and Computer Engineering (IJECE), 11(6), 5549–5557. https://doi.org/10.11591/ijece.v11i6.pp5549-5557

Ogunleye, A.; Wang, Q. G. (2019). XGBoost Model for Chronic Kidney Disease Diagnosis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(6), 2131–2140. https://doi.org/10.1109/TCBB.2019.2911071

Quan, Z.; Valdez, E. A. (2018). Predictive Analytics of Insurance Claims Using Multivariate Decision Trees. Dependence Modeling, 6(1), 377–407. https://doi.org/10.1515/demo-2018-0022

Sahin, E. K. (2020). Assessing the Predictive Capability of Ensemble Tree Methods for Landslide Susceptibility Mapping Using XGBoost, Gradient Boosting Machine, and Random Forest. SN Applied Sciences, 2(7), 1308. https://doi.org/10.1007/s42452-020-3060-1

Sahin, E. K. (2023). Implementation of Free and Open-Source Semi-Automatic Feature Engineering Tool in Landslide Susceptibility Mapping Using the Machine-Learning Algorithms RF, SVM, and XGBoost. Stochastic Environmental Research and Risk Assessment, 37(3), 1067–1092. https://doi.org/10.1007/s00477-022-02330-y

Selamat, N.A.; Abdullah, A.; Diah, N. M. (2022). Association Features of SMOTE and ROSE for Drug Addiction Relapse Risk. Journal of King Saud University - Computer and Information Sciences, 34(9), 7710–7719. https://doi.org/10.1016/j.jksuci.2022.06.012

Shen, Y. (2005). Loss Functions for Binary Classification and Class Probability Estimation. https://www.proquest.com/openview/ff8caed03c746ebca2d686ec5b385710/1?pq-origsite=gscholar&cbl=18750&diss=y

Singhal, Y.; Jain, A.; Batra, S.; Varshney, Y.; Rathi, M. (2018). Review of Bagging and Boosting Classification Performance on Unbalanced Binary Classification. IEEE 8th International Advance Computing Conference (IACC), 338–343. https://doi.org/10.1109/IADCC.2018.8692138

Wang, W.; Lu, Y. (2018). Analysis of the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) in Assessing Rounding Model. IOP Conference Series: Materials Science and Engineering, 324, 012049. https://doi.org/10.1088/1757-899X/324/1/012049

Wang, X.; Fu, D.; Wang, Y.; Guo, Y.; Ding, Y. (2021). The XGBoost and the SVM-Based Prediction Models for Bioretention Cell Decontamination Effect. Arabian Journal of Geosciences, 14, 1–11. https://doi.org/10.1007/s12517-021-07013-6

Wei, P.; Lu, Z.; Song, J. (2015). Variable Importance Analysis: A Comprehensive Review. Reliability Engineering & System Safety, 142, 399–432. https://doi.org/10.1016/j.ress.2015.05.018

White, R. S.; Lui, B.; Bryant-Huppert, J.; Chaturvedi, R.; Hoyler, M.; Aaronson, J. (2022). Economic Burden of Maternal Mortality in the USA. Journal of Comparative Effectiveness Research, 11(13), 927–933. https://doi.org/10.2217/cer-2022-0056

William, J.; Chojenta, C.; Martin, M. A.; Loxton, D. (2019). An Actuarial Investigation Into Maternal Out-of-Hospital Cost Risk Factors. Annals of Actuarial Science, 13(1), 1–35. https://doi.org/10.1017/S1748499518000015

William, J.; Martin, M. A.; Chojenta, C.; Loxton, D. (2018). An Actuarial Investigation Into Maternal Hospital Cost Risk Factors for Public Patients. Annals of Actuarial Science, 12(1), 106–129. https://doi.org/10.1017/S174849951700015X

Zhang, Y.; Wang, J.; Liang, B.; Wu, H.; Chen, Y. (2023). Diagnosis of Malignant Pleural Effusion with Combinations of Multiple Tumor Markers: A Comparison Study of Five Machine Learning Models. The International Journal of Biological Markers, 38(2), 03936155231158125. https://doi.org/10.1177/03936155231158125




DOI: https://doi.org/10.31764/jtam.v7i4.16825

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Felivia Kusnadi, Andry Wijaya, Julius Dharma Lesmono

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

_______________________________________________

JTAM already indexing:

                     


_______________________________________________

 

Creative Commons License

JTAM (Jurnal Teori dan Aplikasi Matematika) 
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

______________________________________________

_______________________________________________

_______________________________________________ 

JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: