Abstract

Missing data is common in data analytics, including rainfall, and can reduce accuracy through biased estimates that lead to invalid conclusions. Many methods have been proposed to handle missing data but the same method might not be suitable for every dataset. This challenge was taken up here as the presentation of a comparative study of popular methods to handle the issue of missing values. Popular methods of handling missing data values include removal of instances with missing values, arithmetic mean (AM), normal ratio (NR), nearest neighbor (NN) and inverse distance (ID). We proposed a method of estimating missing values called step-5 simple moving average (step-5 SMA). This applies simple moving average (SMA) principles with consideration of the EI Nino-Southern Oscillation (ENSO) phenomenon. After enhancing the training set, our method was used to model monthly rainfall forecasts utilizing two supervised machine learning algorithms as support vector machines (SVM) and k-nearest neighbor (k-NN). We used monthly precipitation data (in mm) gathered between 1953 and 2013 from 26 water measurement stations of the Meteorological Department located in Northeast Thailand. After evaluating by MAE and RMSE, results showed that monthly rainfall forecasters developed by the training set that removed observations with missing values returned the lowest performance. Enhancing the quality of the training set using our step-5 SMA gave a better performance than the other missing value estimation methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call