Abstract

Missing data is a common problem in hydrological studies; therefore, data reconstruction is critical, especially when it is crucial to employ all available resources, even incomplete records. Furthermore, missing data could have an impact on statistical analysis results, and the amount of variability in the data would not be fittingly anticipated. As a result, this study compared the performance of three imputation methods in predicting recurrence in streamflow datasets: robust random regression imputation (RRRI), k-nearest neighbours (k-NN), and classification and regression tree (CART). Furthermore, entire historical daily streamflow data from 2012 to 2014 (as training dataset) were utilised to assess and validate the effectiveness of the imputation methods in addressing missing streamflow data. Following that, all three methods coupled with multiple linear regression (MLR), were used to restore streamflow rates in Malaysia's Langat River Basin from 1978 to 2016. The estimation techniques effectiveness was evaluated using metrics inclusive of the Nash-Sutcliffe efficiency coefficient (CE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE). The results confirmed that RRRI coupled with MLR (RRRI-MLR) had the lowest RMSE and MAPE values, outperforming all other techniques tested for filling missing data in daily streamflow datasets. This indicates that the RRRI-MLR is the best method for dealing with missing data in streamflow datasets. Doi: 10.28991/cej-2021-03091747 Full Text: PDF

Highlights

  • Missing data in hydrological models is a prevalent problem owing to natural disasters, improper operation, and battery drainage, which restrict hydrological analysis [1, 2] and has remained unsolved regardless of advancements in missing data imputation techniques over the years [3]

  • The simulation process was performed in the following flow: a conventional training dataset was generated using the missing data rates (i.e. 5, 10, 15, 20, 25, and 30%), and the missing values were substituted with new values, acquired using each of the previously mentioned imputation methods

  • Missing data is a frequent constraint of hydrological research and usually leads to misinterpretations of statistical output and hydrological modelling techniques

Read more

Summary

Introduction

Missing data in hydrological models is a prevalent problem owing to natural disasters, improper operation, and battery drainage, which restrict hydrological analysis [1, 2] and has remained unsolved regardless of advancements in missing data imputation techniques over the years [3]. The lack of particular data can pose severe problems in hydrological studies, resulting in uncertainty and low efficiency of water resource systems [4,5,6]. Even minor data breaches can prohibit the computation of significant summary statistics and hydrological indexes, such as monthly runoff totals or n-day minimum flows, restricting analysis and explanation of historical flow variability [7]. As a result of these disadvantages, gaps must be filled, and the handling of missing data should be prioritised in the data preparation procedure

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call