Handling Missing Values using Weighted Linear Combination of KNN-SVD: A Case Study of Rainfall Data in West Java

Rizkian Agung Jamaesa, Sri Nurdiati, Elis Khatizah, Mohamad Khoirun Najib, Lilis Sri Wahyuni

Abstract


This study is an experimental and comparative quantitative research that evaluates missing value imputation methods for daily rainfall data in West Java. Rainfall data are crucial for environmental policies, particularly in flood control and water resource management. Daily rainfall records from five BMKG stations in West Java were used in this study. Although these stations provide accurate data through direct measurement, missing values often occur due to human error or equipment problems. To solve this, we introduce an integrated imputation method that combines K-Nearest Neighbors (KNN) and Singular Value Decomposition (SVD) with a Weighted Linear Combination (WLC) approach. This method represents a significant improvement over the single-model imputation methods employed in earlier research. We split the dataset into training and testing sets using five different ratios (95:5%, 90:10%, 80:20%, 70:30%, and 64:40%) to test the model's performance. We measured effectiveness using Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The results show that the combined KNN–SVD method outperforms KNN or SVD alone in all cases. The best results were obtained from the 95:5% split, with the lowest MAE and RMSE values of 7.35 and 13.22, respectively. These results suggest that the integrated KNN–SVD imputation model enhances the reliability of rainfall datasets, thereby improving climate information for hydrological studies, disaster risk reduction, and policy-making in West Java.

Keywords


Imputation; Missing Value; Integration; Rainfall Dataset.

Full Text:

DOWNLOAD [PDF]

References


Afshar, M., & Usefi, H. (2021). Dimensionality reduction using singular vectors. Scientific Reports, 11(1), 1–13. https://doi.org/10.1038/s41598-021-83150-y

Agrawal, J. Das. (2023). ANN in forecasting Missing Rainfall Data. E3S Web of Conferences, 405(2), 1–10. https://doi.org/10.1051/e3sconf/202340504017

Arciniegas-Alarcón, S., García-Peña, M., & Krzanowski, W. J. (2020). Imputation using the singular value decomposition: Variants of existing methods, proposed and assessed. International Journal of Innovative Computing, Information and Control, 16(5), 1681–1696. https://doi.org/10.24507/ijicic.16.05.1681

Arciniegas-Alarcón, S., García-Peña, M., Krzanowski, W. J., & Rengifo, C. (2023). Missing value imputation in a data matrix using the regularised singular value decomposition. MethodsX, 11(July), 1-8. https://doi.org/10.1016/j.mex.2023.102289

Azman, A. H., Tukimat, N. N. A., & Malek, M. A. (2021). Comparison of Missing Rainfall Data Treatment Analysis at Kenyir Lake. IOP Conference Series: Materials Science and Engineering, 1144(1), 012046. https://doi.org/10.1088/1757-899x/1144/1/012046

Brand, M. (2002). Incremental singular value decomposition of uncertain data with missing values. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2350(1), 707–720. https://doi.org/10.1007/3-540-47969-4_47

Carter, B., & Rinner, C. (2014). Locally weighted linear combination in a vector geographic information system. Journal of Geographical Systems, 16(3), 343–361. https://doi.org/10.1007/s10109-013-0194-3

Cihan, P., & Ozger, Z. B. (2019). A new heuristic approach for treating missing value: ABCIMP. Elektronika Ir Elektrotechnika, 25(6), 48–54. https://doi.org/10.5755/j01.eie.25.6.24826

Duarte, L. V., Formiga, K. T. M., & Costa, V. A. F. (2022). Comparison of Methods for Filling Daily and Monthly Rainfall Missing Data: Statistical Models or Imputation of Satellite Retrievals? Water (Switzerland), 14(19), 1-20. https://doi.org/10.3390/w14193144

Erichson, N. B., Voronin, S., Brunton, S. L., & Kutz, J. N. (2019). Randomized matrix decompositions using R. Journal of Statistical Software, 89(11), 1-48. https://doi.org/10.18637/jss.v089.i11

Fadlil, A., Herman, & Praseptian M, D. (2022). K Nearest Neighbor Imputation Performance on Missing Value Data Graduate User Satisfaction. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(4), 570–576. https://doi.org/10.29207/resti.v6i4.4173

Farhangfar, A., Kurgan, L., & Dy, J. (2008). Impact of imputation of missing values on classification error for discrete data. Pattern Recognition, 41(12), 3692–3705. https://doi.org/10.1016/j.patcog.2008.05.019

García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J. M., & Herrera, F. (2016). Big data preprocessing: methods and prospects. Big Data Analytics, 1(1), 0–36. https://doi.org/10.1186/s41044-016-0014-0

Ha, J., Kambe, M., & Pe, J. (2011). Data Mining: Concepts and Techniques. In Data Mining: Concepts and Techniques, Morgan Kaufman. https://doi.org/10.1016/C2009-0-61819-5

Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2), 217–288. https://doi.org/10.1137/090771806

Hariyanto, D. (2020). Optimization of Missing Value Data Imputation Automatic Dependent Surveillance Broadcasting ( ADS-B ) Based on K-Nearest Neighbor and Genetic Algorithm. 9(12), 327–331. https://doi.org/10.7753/ijcatr0912.1003

Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P., & Botstein, D. (2006). Imputing missing data for gene expression arrays. Stanford University Statistics Department Technical Report Httpwwwstat Stanford Edu HastiePapersmissing Pdf Cll Qxd, 3(March 2013), 27. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.79.9789&rep=rep1&type=pdf

Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., & Kolehmainen, M. (2004). Methods for imputation of missing values in air quality data sets. Atmospheric Environment, 38(18), 2895–2907. https://doi.org/10.1016/j.atmosenv.2004.02.026

Kim, T., Ko, W., & Kim, J. (2019). Analysis and impact evaluation of missing data imputation in day-ahead PV generation forecasting. Applied Sciences (Switzerland), 9(1), 1-18. https://doi.org/10.3390/app9010204

Lee, B., Lee, H., & Ahn, H. (2020). Improving load forecasting of electric vehicle charging stations through missing data imputation. Energies, 13(18), 1–15. https://doi.org/10.3390/en13184893

Liu, C. H., Tsai, C. F., Sue, K. L., & Huang, M. W. (2020). The feature selection effect on missing value imputation of medical datasets. Applied Sciences (Switzerland), 10(7), 1–12. https://doi.org/10.3390/app10072344

Malczewski, J. (2000). On the use of weighted linear combination method in GIS: Common and best practice approaches. Transactions in GIS, 4(1), 5–22. https://doi.org/10.1111/1467-9671.00035

Moatadid, I., Abnane, I., & Idri, A. (2023). Comparing Ensemble and Single Classifiers Using KNN Imputation for Incomplete Heart Disease Datasets. International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K - Proceedings, 1(Ic3k), 379–386. https://doi.org/10.5220/0012208300003598

Mohammed, M. B., Zulkafli, H. S., Adam, M. B., Ali, N., & Baba, I. A. (2021). Comparison of five imputation methods in handling missing data in a continuous frequency table. AIP Conference Proceedings, 2355(April 2022). https://doi.org/10.1063/5.0053286

Nida, H., Kashif, M., Khan, M. I., & Ghamkhar, M. (2023). Comparison of missing data imputation methods using weather data. Pakistan Journal of Agricultural Sciences, 60(2), 327–336. https://doi.org/10.21162/PAKJAS/23.228

Nti, I. K., Nyarko-Boateng, O., & Aning, J. (2021). Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation. International Journal of Information Technology and Computer Science, 13(6), 61–71. https://doi.org/10.5815/ijitcs.2021.06.05

Ochieng’ Odhiambo, F. (2020). Comparative Study of Various Methods of Handling Missing Data. Mathematical Modelling and Applications, 5(2), 87. https://doi.org/10.11648/j.mma.20200502.14

Pertiwi, A. G., Bachtiar, N., Kusumaningrum, R., Waspada, I., & Wibowo, A. (2020). Comparison of performance of k-nearest neighbor algorithm using smote and k-nearest neighbor algorithm without smote in diagnosis of diabetes disease in balanced data. Journal of Physics: Conference Series, 1524(1), 1-8. https://doi.org/10.1088/1742-6596/1524/1/012048

Purnamasari, I., Wahyu Saputra, T., & Ristiyana, S. (2021). Pola Spasial Kekeringan di Jawa Barat Pada Kondisi El Nino Berbasis Metode Palmer Drought Severity Index (PDSI). Jurnal Teknik Pengairan, 12(1), 16–29. https://doi.org/10.21776/ub.pengairan.2021.012.01.02

Radjab, F., Akib, H., Jasruddin, J., Rifdan, R., & Umar, F. (2020). Three Parties Partnership Between BMKG, Government Institution and General Public on Management of Rainfall Observations Networks in South Sulawesi. SSRN Electronic Journal, August, 7(2), 28–30. https://doi.org/10.2139/ssrn.3528943

Rafhida, S. A., Nurdiati, S., Budiarti, R., & Najib, M. K. (2024). Bias Correction of Lake Toba Rainfall Data Using Quantile Delta Mapping. CAUCHY: Jurnal Matematika Murni Dan Aplikasi, 9(2), 297–309. https://doi.org/10.18860/ca.v9i2.29124

Richards, L. E., Little, R. J. A., & Rubin, D. B. (1989). Statistical Analysis with Missing Data. In Journal of Marketing Research (Vol. 26, Issue 3), 1-449. https://doi.org/10.2307/3172915

Saeipourdizaj, P., Sarbakhsh, P., & Gholampour, A. (2021). Application of imputation methods for missing values of pm10 and o3 data: Interpolation, moving average and k-nearest neighbor methods. Environmental Health Engineering and Management, 8(3), 215–226. https://doi.org/10.34172/EHEM.2021.25

Sanjar, K., Bekhzod, O., Kim, J., Paul, A., & Kim, J. (2020). Missing Data Imputation for Geolocation-based Price Prediction Using KNN – MCF Method, 9(227), 1-13. https://doi:10.3390/ijgi9040227

Septiawan, P., Nurdiati, S., & Sopaheluwakan, A. (2019). Numerical Analysis using Empirical Orthogonal Function Based on Multivariate Singular Value Decomposition on Indonesian Forest Fire Signal. IOP Conference Series: Earth and Environmental Science, 303(1), 1-10. https://doi.org/10.1088/1755-1315/303/1/012053

Shantal, M., Othman, Z., & Bakar, A. A. (2023). Impact of Missing Data on Correlation Coefficient Values: Deletion and Imputation Methods for Data Preparation. Malaysian Journal of Fundamental and Applied Sciences, 19(6), 1052–1067. https://doi.org/10.11113/mjfas.v19n6.3098

Sriwahyuni, L., Nurdiati, S., Nugrahani, E. H., Sukmana, I., & Najib, M. K. (2025). Imputation of Missing Daily Rainfall Data Using Convolutional Neural Networks (Cnn) With Spatial Interpolation. Barekeng, 19(4), 2921–2936. https://doi.org/10.30598/barekengvol19iss4pp2921-2936

Steinbach, M., & Tan, P. N. (2009). kNN: k-Nearest Neighbors. In The Top Ten Algorithms in Data Mining (pp. 151–161). https://doi.org/10.1201/9781420089653-15

Syauqi, R. M., Sabrina, P. N., & Santikarama, I. (2023). K-Means Clustering with KNN and Mean Imputation on CPU Benchmark Compilation Data. Journal of Applied Informatics and Computing, 7(2), 231–239. https://doi.org/10.30871/jaic.v7i2.6491

Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82. https://doi.org/10.3354/cr030079

Yuan, X., Han, L., Qian, S., Xu, G., & Yan, H. (2019). Singular value decomposition based recommendation using imputed data. Knowledge-Based Systems, 163(1), 485–494. https://doi.org/10.1016/j.knosys.2018.09.011

Zhang, Z. W., Tian, H. P., Yan, L. Z., Martin, A., & Zhou, K. (2022). Learning a Credal Classifier With Optimized and Adaptive Multiestimation for Missing Data Imputation. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(7), 4092–4104. https://doi.org/10.1109/TSMC.2021.3090210




DOI: https://doi.org/10.31764/jtam.v10i3.36708

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Rizkian Agung Jamaesa, Sri Nurdiati, Elis Khatizah, Mohamad Khoirun Najib, Lilis Sri Wahyuni

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

_______________________________________________

JTAM already indexing:

                     


_______________________________________________

 

Creative Commons License

JTAM (Jurnal Teori dan Aplikasi Matematika) 
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

______________________________________________

_______________________________________________

_______________________________________________ 

JTAM (Jurnal Teori dan Aplikasi Matematika) Editorial Office: