Robust Continuum Regression Study of LASSO Selection and WLAD LASSO on High-Dimensional Data Containing Outliers

Nurmai Syaroh Daulay; Erfiani Erfiani; Agus M Soleh

doi:10.31764/jtam.v8i3.23123

Robust Continuum Regression Study of LASSO Selection and WLAD LASSO on High-Dimensional Data Containing Outliers

Nurmai Syaroh Daulay, Erfiani Erfiani, Agus M Soleh

Abstract

In research, we often encounter problems of multicollinearity and outliers, which can cause coefficients to become unstable and reduce model performance. Robust Continuum Regression (RCR) overcomes the problem of multicollinearity by reducing the number of independent variables, namely compressing the data into new variables (latent variables) that are independent of each other and whose dimensions are much smaller and applying robust regression techniques so that the complexity of the regression model can be reduced without losing essential information from data and provide more stable parameter estimates. However, it is hampered in the computational aspect if the data has very high dimensions (p>>n). In the initial stage, it is necessary to reduce dimensions by selecting variables. The Least Absolute Shrinkage and Selection Operator (LASSO) can overcome this but is sensitive to the presence of outliers, which can result in errors in selecting significant variables. Therefore, we need a method that is robust to outliers in selecting explanatory variables such as Weighted Least Absolute Deviations with LASSO penalty (WLAD LASSO) in selecting variables by considering the absolute deviation of the residuals. This method aims to overcome the problem of multicollinearity and model instability in high-dimensional data by paying attention to resistance to outliers. Leverages the outlier resistant RCR and variable selection capabilities of LASSO and WLAD LASSO to provide a more reliable and efficient solution for complex data analysis. Measure the performance of RKR-LASSO and RKR-WLAD LASSO; simulations were carried out using low-dimensional data and high-dimensional data with two scenarios, namely without outliers (δ= 0%) and with outliers (δ= 10%, 20%, 30%) with a level of correlation (ρ = 0.1,0.5,0.9). The analysis stage uses RStudio version 4.1.3 software using the "MASS" package to generate data that has a multivariate normal distribution, the "glmnet" package for LASSO variable selection, the "MTE" package for WLAD LASSO variable selection. The simulation results show the performance of RKR-LASSO tends to be superior in terms of model goodness of fit compared to RKR-WLAD LASSO. However, the performance of RKR-LASSO tends to decrease as outliers and correlations increase. RKR-LASSO tends to be looser in selecting relevant variables, resulting in a simpler model, but the variables chosen by LASSO are only marginally significant. RKR-WLAD LASSO is stricter in variable selection and only selects significant variables but ignores several variables that have a small but significant impact on the model.

Keywords

Robust Continuum Regression; LASSO; WLAD LASSO.

Full Text:

DOWNLOAD [PDF]

References

Ajeel, S. M., & Hashem, H. A. (2020). Comparison Some Robust Regularization Methods in Linear Regression via Simulation Study. Academic Journal of Nawroz University, 9(2), 244–252. https://doi.org/10.25007/ajnu.v9n2a818

Arisandi, A., Wigena, A. H., & Mohamad Soleh, A. (2020). Continuum Regression Modeling with LASSO to Estimate Rainfall. International Journal of Scientific and Research Publications (IJSRP), 10(10), 380–385. https://doi.org/10.29322/ijsrp.10.10.2020.p10651

Arslan, O. (2012). Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression. Computational Statistics and Data Analysis, 56(6), 1952–1965. https://doi.org/10.1016/j.csda.2011.11.022

Boudt, K., Rousseeuw, P. J., Vanduffel, S., & Verdonck, T. (2020). The minimum regularized covariance determinant estimator. Statistics and Computing, 30(1), 113–128. https://doi.org/10.1007/s11222-019-09869-x

Bulut, H. (2020). Mahalanobis distance based on minimum regularized covariance determinant estimators for high dimensional data. Communications in Statistics - Theory and Methods, 49(24), 5897–5907. https://doi.org/10.1080/03610926.2020.1719420

Cahya, S. D., Sartono, B., Indahwati, I., & Purnaningrum, E. (2022). Performance of LAD-LASSO and WLAD-LASSO on High Dimensional Regression in Handling Data Containing Outliers. JTAM (Jurnal Teori Dan Aplikasi Matematika), 6(4), 844–856. https://doi.org/10.31764/jtam.v6i4.8968

Chen, X., & Zhu, L. P. (2015). Connecting continuum regression with sufficient dimension reduction. Statistics and Probability Letters, 98(2015), 44–49. https://doi.org/10.1016/j.spl.2014.12.007

ChunRong, C., ShanXiong, C., Lin, C., & YuChen, Z. (2017). Method for Solving LASSO Problem Based on Multidimensional Weight. Advances in Artificial Intelligence, 2017(1), 1–9. https://doi.org/10.1155/2017/1736389

Cui, C., & Wang, D. (2016). High dimensional data regression using Lasso model and neural networks with random weights. Information Sciences, 372(2016), 505–517. https://doi.org/10.1016/j.ins.2016.08.060

Emmert-Streib, F., & Dehmer, M. (2019). High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection. Machine Learning and Knowledge Extraction, 1(1), 359–383. https://doi.org/10.3390/make1010021

Ismah, I., Erfiani, Wigena, A. H., & Sartono, B. (2024). Functional Continuum Regression Approach to Wavelet Transformation Data in a Non-Invasive Glucose Measurement Calibration Model. Mathematics and Statistics, 12(1), 69–79. https://doi.org/10.13189/ms.2024.120110

Izenman, A. J. (2008). Modern Multivariate Statistical Techniques (G. Casella, S. Fienbarg, & I. Olkin, Eds.; Vol. 2008). Springer New York. https://doi.org/10.1007/978-0-387-78189-1

Khotimah, K., Sadik, K., & Rizki, A. (2020). Study of Robust Regression Modeling Using MM-Estimator and Least Median Squares. Proceedings of the 1st International Conference on Statistics and Analytics, 1–20. https://doi.org/10.4108/eai.2-8-2019.2290533

Lakshmi, K., Mahaboob, B., Rajaiah, M., & Narayana, C. (2021). Ordinary least squares estimation of parameters of linear model. Journal of Mathematical and Computational Science, 11(2), 2015–2030. https://doi.org/10.28919/jmcs/5454

Lee, S., Seo, M. H., & Shin, Y. (2016). The lasso for high dimensional regression with a possible change point. J. R. Statist. Soc. B, 78(1), 193–210. https://doi.org/https://doi.org/10.1111/rssb.12108

Leys, C., Klein, O., Dominicy, Y., & Ley, C. (2018). Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. Journal of Experimental Social Psychology, 74(2018), 150–156. https://doi.org/10.1016/j.jesp.2017.09.011

Lima, E., Davies, P., Kaler, J., Lovatt, F., & Green, M. (2020). Variable selection for inferential models with relatively high-dimensional data: Between method heterogeneity and covariate stability as adjuncts to robust selection. Scientific Reports, 10(1), 1–11. https://doi.org/10.1038/s41598-020-64829-0

Robert Tibshirani. (1996). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society B, 58(1), 267–288. https://doi.org/https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Serneels, S., Filzmoser, P., Croux, C., & Van Espen, P. J. (2005). Robust continuum regression. Chemometrics and Intelligent Laboratory Systems, 76(2), 197–204. https://doi.org/10.1016/j.chemolab.2004.11.002

setiawan, & Notodiputro, K. A. (2007). Regresi Kontinum dengan Prapemrosesan Transformasi Wavelet Diskret (Continum Regression with Discrete Wavelet Transformation Preprocessing). Jurnal ILMU DASAR, 8(2), 103–109.

Stone, M., & Brooks, R. J. (1990). Continuum Regression: Cross-Validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares and Principal Components Regression. Journal of the Royal Statistical Society: Series B, 52(2), 237–269. https://doi.org/10.1111/j.2517-6161.1990.tb01786.x

Tsagris, M., & Pandis, N. (2021). Multicollinearity. Statistics and Research Design, 159(5), 695–696. https://doi.org/10.1016/j.ajodo.2021.02.005

Velliangiri, S., Alagumuthukrishnan, S., & Thankumar Joseph, S. I. (2019). A Review of Dimensionality Reduction Techniques for Efficient Computation. Procedia Computer Science, 165(2019), 104–111. https://doi.org/10.1016/j.procs.2020.01.079

Wahid, A., Khan, D. M., & Hussain, I. (2017). Robust Adaptive Lasso method for parameter’s estimation and variable selection in high-dimensional sparse models. PLOS ONE, 12(8), 1–17. https://doi.org/10.1371/journal.pone.0183518

Xie, Z., Feng, X., Chen, X., & Huang, G. (2020). Optimizing a vector of shrinkage factors for continuum regression. Chemometrics and Intelligent Laboratory Systems, 206(2020), 104–111. https://doi.org/10.1016/j.chemolab.2020.104141

Yang, H., & Li, N. (2018). WLAD-LASSO method for robust estimation and variable selection in partially linear models. Communications in Statistics - Theory and Methods, 47(20), 4958–4976. https://doi.org/10.1080/03610926.2017.1383427

Zhang, X. Y., Li, Q. B., & Zhang, G. J. (2011). Modified robust continuum regression by net analyte signal to improve prediction performance for data with outliers. Chemometrics and Intelligent Laboratory Systems, 107(2), 333–342. https://doi.org/10.1016/j.chemolab.2011.05.003

Zhou, Z. (2019). Functional continuum regression. Journal of Multivariate Analysis, 173(20), 1–22. https://doi.org/10.1016/j.jmva.2019.03.006

DOI: https://doi.org/10.31764/jtam.v8i3.23123