Comparative Evaluation of Eigenvector Selection in Eigenvector Spatial Filtering using a Gradient Boosting Machine for PM2.5 Concentration Prediction

Putri Nisrina Az-Zahra, Anik Djuraidah, Erfiani Erfiani

Abstract


Spatial dependence remains a critical issue in spatial data analysis. To address this issue, various eigenvector selection methods within the Eigenvector Spatial Filtering (ESF) framework have been proposed. However, these methods often do not provide explicit information regarding the individual contribution of each spatial component, limiting model interpretability, particularly when dealing with a large number of candidate eigenvectors and complex models. In addition, ESF has limitations in capturing nonlinear relationships and complex interactions inherent in spatial data, while its integration with advanced feature selection methods within machine learning frameworks remains underexplored. This quantitative empirical study aims to evaluate different eigenvector selection methods within ESF integrated with a Gradient Boosting Machine (GBM) model for predicting PM2.5 concentrations in DKI Jakarta. Data were collected from 100 monitoring stations across five administrative regions for the first half of 2025. Spatial eigenvectors were derived from a spatial weights matrix and selected using four methods: positive eigenvalues, Moran’s Index significance, LASSO regression, and SHAP values obtained from the GBM model. Model performance was assessed using both 10-fold random cross-validation and spatial blocked cross-validation to evaluate predictive accuracy and spatial generalization. The results showed that adding spatial eigenvectors significantly improved the model performance compared to models without spatial components. Under 10-fold cross-validation, the SHAP-based selection method achieved the highest predictive accuracy (R² = 0.619), effectively capturing spatial dependence and nonlinear relationships. The SHAP method demonstrated robustness by selecting stable and consistent spatial components across different regions. These findings highlight the methodological advantage of integrating ESF with machine learning and SHAP-based feature selection, offering a more interpretable and robust framework for spatial modelling. Practically, the improved prediction of PM2.5 concentrations can support more accurate air quality assessments and inform environmental management strategies in urban areas.

Keywords


Eigenvector Spatial Filtering; Gradient Boosting Machine; SHAP; PM2.5.

Full Text:

DOWNLOAD [PDF]

References


Ahmadi, M., Shafapourtehrany, M., Özener, H., Yilmaz, O. M., Kalantar, B., & Shabani, F. (2024). Eigenvector spatial filtering enhancing natural hazards vulnerability assessment in a susceptible urban environment: A case study of Izmir earthquake in Turkey. Environmental Technology & Innovation, 35(May), 103666. https://doi.org/10.1016/j.eti.2024.103666

Crinnion, W. (2017). Particulate Matter Is a Surprisingly Common Contributor to Disease. Integrative Medicine (Encinitas, Calif.), 16(4), 8–12. http://www.ncbi.nlm.nih.gov/pubmed/30881250

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

Griffith, D. A., & Chun, Y. (2019). Implementing Moran eigenvector spatial filtering for massively large georeferenced datasets. International Journal of Geographical Information Science, 33(9), 1703–1717. https://doi.org/10.1080/13658816.2019.1593421

Islam, M. D., Li, B., Islam, K. S., Ahasan, R., Mia, Md. R., & Haque, M. E. (2022). Airbnb rental price modeling based on Latent Dirichlet Allocation and MESF-XGBoost composite model. Machine Learning with Applications, 7, 100208. https://doi.org/10.1016/j.mlwa.2021.100208

Kusumaningtyas, S. D. A., Khoir, A. N., Fibriantika, E., & Heriyanto, E. (2021). Effect of meteorological parameter to variability of Particulate Matter (PM) concentration in urban Jakarta city, Indonesia. IOP Conference Series: Earth and Environmental Science, 724(1). https://doi.org/10.1088/1755-1315/724/1/012050

Li, Z. (2022). Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost. Computers, Environment and Urban Systems, 96. https://doi.org/10.1016/j.compenvurbsys.2022.101845

Liu, X., Kounadi, O., & Zurita-Milla, R. (2022). Incorporating Spatial Autocorrelation in Machine Learning Models Using Spatial Lag and Eigenvector Spatial Filtering Features. ISPRS International Journal of Geo-Information, 11(4), 242. https://doi.org/10.3390/ijgi11040242

Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 2017-Decem(Section 2), 4766–4775. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html

Mahkya, D. Al, Djuraidah, A., Wigena, A. H., & Sartono, B. (2024). Rainfall modeling with CMIP6-DCPP outputs and local characteristic information using eigenvector spatial filtering varying coefficient (ESF-VC). Journal of Agrometeorology, 26(3), 311–317. https://doi.org/10.54386/jam.v26i3.2599

Marcilio, W. E., & Eler, D. M. (2020). From explanations to feature selection: assessing SHAP values as feature selection mechanism. 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 340–347. https://doi.org/10.1109/SIBGRAPI51738.2020.00053

McCord, M. J., McCord, J., Davis, P. T., Haran, M., & Bidanset, P. (2020). House price estimation using an eigenvector spatial filtering approach. International Journal of Housing Markets and Analysis, 13(5), 845–867. https://doi.org/10.1108/IJHMA-09-2019-0097

Murakami, D., & Griffith, D. A. (2019). Eigenvector Spatial Filtering for Large Data Sets: Fixed and Random Effects Approaches. Geographical Analysis, 51(1), 23–49. https://doi.org/10.1111/gean.12156

Murakami, D., Yoshida, T., Seya, H., Griffith, D. A., & Yamagata, Y. (2017). A Moran coefficient-based mixed effects approach to investigate spatially varying relationships. Spatial Statistics, 19, 68–89. https://doi.org/10.1016/j.spasta.2016.12.001

Seya, H., Murakami, D., Tsutsumi, M., & Yamagata, Y. (2015). Application of LASSO to the Eigenvector Selection Problem in Eigenvector‐based Spatial Filtering. Geographical Analysis, 47(3), 284–299. https://doi.org/10.1111/gean.12054

Singh, U., Rizwan, M., Alaraj, M., & Alsaidan, I. (2021). A Machine Learning-Based Gradient Boosting Regression Approach for Wind Power Production Forecasting: A Step towards Smart Grid Environments. Energies, 14(16), 5196. https://doi.org/10.3390/en14165196

Sotoudeheian, S., & Arhami, M. (2021). Estimating ground-level PM2.5 concentrations by developing and optimizing machine learning and statistical models using 3 km MODIS AODs: case study of Tehran, Iran. Journal of Environmental Health Science and Engineering, 19(1), 1–21. https://doi.org/10.1007/s40201-020-00509-5

Sun, W., Murakami, D., Hu, X., Li, Z., & Kidd, A. N. (2023). Supply – Demand Imbalance in School Land : An Eigenvector Spatial Filtering Approach. Sustainability, 15(17), 12935. https://doi.org/10.3390/su151712935

Wang, Z., Wu, X., & Wu, Y. (2023). A spatiotemporal XGBoost model for PM2.5 concentration prediction and its application in Shanghai. Heliyon, 9(12), e22569. https://doi.org/10.1016/j.heliyon.2023.e22569

Xu, J., Liu, Z., Yin, L., Liu, Y., Tian, J., Gu, Y., Zheng, W., Yang, B., & Liu, S. (2021). Grey Correlation Analysis of Haze Impact Factor PM2.5. Atmosphere, 12(11), 1513. https://doi.org/10.3390/atmos12111513

Zhang, J., Li, B., Chen, Y., Chen, M., Fang, T., & Liu, Y. (2018). Eigenvector Spatial Filtering Regression Modeling of Ground PM2.5 Concentrations Using Remotely Sensed Data. International Journal of Environmental Research and Public Health, 15(6), 1228. https://doi.org/10.3390/ijerph15061228




DOI: https://doi.org/10.31764/jtam.v10i3.38883

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Putri Nisrina Az-Zahra, Anik Djuraidah, Erfiani

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

_______________________________________________

JTAM already indexing:

                     


_______________________________________________

 

Creative Commons License

JTAM (Jurnal Teori dan Aplikasi Matematika) 
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

______________________________________________

_______________________________________________

_______________________________________________ 

JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: