Robust Continuum Regression Study of LASSO Selection and WLAD LASSO on High-Dimensional Data Containing Outliers
Abstract
In research, we often encounter problems of multicollinearity and outliers, which can cause coefficients to become unstable and reduce model performance. Robust Continuum Regression (RCR) overcomes the problem of multicollinearity by reducing the number of independent variables, namely compressing the data into new variables (latent variables) that are independent of each other and whose dimensions are much smaller and applying robust regression techniques so that the complexity of the regression model can be reduced without losing essential information from data and provide more stable parameter estimates. However, it is hampered in the computational aspect if the data has very high dimensions (p>>n). In the initial stage, it is necessary to reduce dimensions by selecting variables. The Least Absolute Shrinkage and Selection Operator (LASSO) can overcome this but is sensitive to the presence of outliers, which can result in errors in selecting significant variables. Therefore, we need a method that is robust to outliers in selecting explanatory variables such as Weighted Least Absolute Deviations with LASSO penalty (WLAD LASSO) in selecting variables by considering the absolute deviation of the residuals. This method aims to overcome the problem of multicollinearity and model instability in high-dimensional data by paying attention to resistance to outliers. Leverages the outlier resistant RCR and variable selection capabilities of LASSO and WLAD LASSO to provide a more reliable and efficient solution for complex data analysis. Measure the performance of RKR-LASSO and RKR-WLAD LASSO; simulations were carried out using low-dimensional data and high-dimensional data with two scenarios, namely without outliers (δ= 0%) and with outliers (δ= 10%, 20%, 30%) with a level of correlation (ρ = 0.1,0.5,0.9). The analysis stage uses RStudio version 4.1.3 software using the "MASS" package to generate data that has a multivariate normal distribution, the "glmnet" package for LASSO variable selection, the "MTE" package for WLAD LASSO variable selection. The simulation results show the performance of RKR-LASSO tends to be superior in terms of model goodness of fit compared to RKR-WLAD LASSO. However, the performance of RKR-LASSO tends to decrease as outliers and correlations increase. RKR-LASSO tends to be looser in selecting relevant variables, resulting in a simpler model, but the variables chosen by LASSO are only marginally significant. RKR-WLAD LASSO is stricter in variable selection and only selects significant variables but ignores several variables that have a small but significant impact on the model.
Keywords
Full Text:
DOWNLOAD [PDF]References
Ajeel, S. M., & Hashem, H. A. (2020). Comparison Some Robust Regularization Methods in Linear Regression via Simulation Study. Academic Journal of Nawroz University, 9(2), 244–252. https://doi.org/10.25007/ajnu.v9n2a818
Arisandi, A., Wigena, A. H., & Mohamad Soleh, A. (2020). Continuum Regression Modeling with LASSO to Estimate Rainfall. International Journal of Scientific and Research Publications (IJSRP), 10(10), 380–385. https://doi.org/10.29322/ijsrp.10.10.2020.p10651
Arslan, O. (2012). Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression. Computational Statistics and Data Analysis, 56(6), 1952–1965. https://doi.org/10.1016/j.csda.2011.11.022
Boudt, K., Rousseeuw, P. J., Vanduffel, S., & Verdonck, T. (2020). The minimum regularized covariance determinant estimator. Statistics and Computing, 30(1), 113–128. https://doi.org/10.1007/s11222-019-09869-x
Bulut, H. (2020). Mahalanobis distance based on minimum regularized covariance determinant estimators for high dimensional data. Communications in Statistics - Theory and Methods, 49(24), 5897–5907. https://doi.org/10.1080/03610926.2020.1719420
Cahya, S. D., Sartono, B., Indahwati, I., & Purnaningrum, E. (2022). Performance of LAD-LASSO and WLAD-LASSO on High Dimensional Regression in Handling Data Containing Outliers. JTAM (Jurnal Teori Dan Aplikasi Matematika), 6(4), 844–856. https://doi.org/10.31764/jtam.v6i4.8968
Chen, X., & Zhu, L. P. (2015). Connecting continuum regression with sufficient dimension reduction. Statistics and Probability Letters, 98(2015), 44–49. https://doi.org/10.1016/j.spl.2014.12.007
ChunRong, C., ShanXiong, C., Lin, C., & YuChen, Z. (2017). Method for Solving LASSO Problem Based on Multidimensional Weight. Advances in Artificial Intelligence, 2017(1), 1–9. https://doi.org/10.1155/2017/1736389
Cui, C., & Wang, D. (2016). High dimensional data regression using Lasso model and neural networks with random weights. Information Sciences, 372(2016), 505–517. https://doi.org/10.1016/j.ins.2016.08.060
Emmert-Streib, F., & Dehmer, M. (2019). High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection. Machine Learning and Knowledge Extraction, 1(1), 359–383. https://doi.org/10.3390/make1010021
Ismah, I., Erfiani, Wigena, A. H., & Sartono, B. (2024). Functional Continuum Regression Approach to Wavelet Transformation Data in a Non-Invasive Glucose Measurement Calibration Model. Mathematics and Statistics, 12(1), 69–79. https://doi.org/10.13189/ms.2024.120110
Izenman, A. J. (2008). Modern Multivariate Statistical Techniques (G. Casella, S. Fienbarg, & I. Olkin, Eds.; Vol. 2008). Springer New York. https://doi.org/10.1007/978-0-387-78189-1
Khotimah, K., Sadik, K., & Rizki, A. (2020). Study of Robust Regression Modeling Using MM-Estimator and Least Median Squares. Proceedings of the 1st International Conference on Statistics and Analytics, 1–20. https://doi.org/10.4108/eai.2-8-2019.2290533
Lakshmi, K., Mahaboob, B., Rajaiah, M., & Narayana, C. (2021). Ordinary least squares estimation of parameters of linear model. Journal of Mathematical and Computational Science, 11(2), 2015–2030. https://doi.org/10.28919/jmcs/5454
Lee, S., Seo, M. H., & Shin, Y. (2016). The lasso for high dimensional regression with a possible change point. J. R. Statist. Soc. B, 78(1), 193–210. https://doi.org/https://doi.org/10.1111/rssb.12108
Leys, C., Klein, O., Dominicy, Y., & Ley, C. (2018). Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. Journal of Experimental Social Psychology, 74(2018), 150–156. https://doi.org/10.1016/j.jesp.2017.09.011
Lima, E., Davies, P., Kaler, J., Lovatt, F., & Green, M. (2020). Variable selection for inferential models with relatively high-dimensional data: Between method heterogeneity and covariate stability as adjuncts to robust selection. Scientific Reports, 10(1), 1–11. https://doi.org/10.1038/s41598-020-64829-0
Robert Tibshirani. (1996). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society B, 58(1), 267–288. https://doi.org/https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Serneels, S., Filzmoser, P., Croux, C., & Van Espen, P. J. (2005). Robust continuum regression. Chemometrics and Intelligent Laboratory Systems, 76(2), 197–204. https://doi.org/10.1016/j.chemolab.2004.11.002
setiawan, & Notodiputro, K. A. (2007). Regresi Kontinum dengan Prapemrosesan Transformasi Wavelet Diskret (Continum Regression with Discrete Wavelet Transformation Preprocessing). Jurnal ILMU DASAR, 8(2), 103–109.
Stone, M., & Brooks, R. J. (1990). Continuum Regression: Cross-Validated Sequentially Constructed Prediction Embracing Ordinary Least Squares, Partial Least Squares and Principal Components Regression. Journal of the Royal Statistical Society: Series B, 52(2), 237–269. https://doi.org/10.1111/j.2517-6161.1990.tb01786.x
Tsagris, M., & Pandis, N. (2021). Multicollinearity. Statistics and Research Design, 159(5), 695–696. https://doi.org/10.1016/j.ajodo.2021.02.005
Velliangiri, S., Alagumuthukrishnan, S., & Thankumar Joseph, S. I. (2019). A Review of Dimensionality Reduction Techniques for Efficient Computation. Procedia Computer Science, 165(2019), 104–111. https://doi.org/10.1016/j.procs.2020.01.079
Wahid, A., Khan, D. M., & Hussain, I. (2017). Robust Adaptive Lasso method for parameter’s estimation and variable selection in high-dimensional sparse models. PLOS ONE, 12(8), 1–17. https://doi.org/10.1371/journal.pone.0183518
Xie, Z., Feng, X., Chen, X., & Huang, G. (2020). Optimizing a vector of shrinkage factors for continuum regression. Chemometrics and Intelligent Laboratory Systems, 206(2020), 104–111. https://doi.org/10.1016/j.chemolab.2020.104141
Yang, H., & Li, N. (2018). WLAD-LASSO method for robust estimation and variable selection in partially linear models. Communications in Statistics - Theory and Methods, 47(20), 4958–4976. https://doi.org/10.1080/03610926.2017.1383427
Zhang, X. Y., Li, Q. B., & Zhang, G. J. (2011). Modified robust continuum regression by net analyte signal to improve prediction performance for data with outliers. Chemometrics and Intelligent Laboratory Systems, 107(2), 333–342. https://doi.org/10.1016/j.chemolab.2011.05.003
Zhou, Z. (2019). Functional continuum regression. Journal of Multivariate Analysis, 173(20), 1–22. https://doi.org/10.1016/j.jmva.2019.03.006
DOI: https://doi.org/10.31764/jtam.v8i3.23123
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Nurmai Syaroh Daulay, Erfiani, Agus M Soleh
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
_______________________________________________
JTAM already indexing:
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) |
_______________________________________________
_______________________________________________
JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: