Abstract
In this study, the ability of numerous statistical and machine learning models to impute water quality data was investigated at three monitoring stations along the Langat River in Malaysia. Inconsistencies in the percentage of missing data between monitoring stations (varying from 20 percent (moderate) to over 50 percent (high)) represent the greatest obstacle of the study. The main objective was to select the best method for imputation and compare whether there are differences between the methods used by the different stations. The paper focuses on different imputation methods such as Multiple Predictive Mean Matching (PMM), Multiple Random Forest Imputation (RF), Multiple Bayesian Linear Regression Imputation (BLR), Multiple Linear Regression (non-Bayesian) Imputation (LRNB), Multiple Classification and Regression Tree (CART), k-nearest neighbours (kNN) and Bootstrap-based Expectation Maximisation (EMB). Remarkably, among all seven imputation techniques, the kNN produces identically reliable results. The imputed data is all rated as ‘very good’ (NSE > 0.75). This was confirmed by the calculation of |PBIAS|<5.30 (all imputed data are‘very good’) and KGE≥0.87 (all imputations are rated as’ good’). Imputation performance improves for all three monitoring stations with an index of agreement, WI ≥ 0.94, despite varying percentages of missing data. According to the findings, the kNN imputation approach outperforms the others and should be prioritised in actual use. Future research with the existing methods could benefit from the addition of geographical data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.