Abstract
The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to −0.001 mm/day, and R2 values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant.
Highlights
The spatial and temporal analysis of meteorological parameters, such as rainfall is crucial to numerous environmental, hydrological, and agroclimatic studies, as well as optimizing issues, such as water resource management or irrigation scheduling [1,2,3,4].one of the most common problems in time series analyses, such as rainfall datasets, is the presence of gaps of different widths, making this task harder to carry out.This usually results from malfunctioning sensors or data loggers, lack of maintenance, meteorological events, or power outages
One of the most frequent algorithms to estimate missing rainfall records is the inverse distance weighting method (IDWM), where the estimated values are calculated with a weighted average from neighbor stations [8,9]
In order to help the reproducibility of this work, the best machine learning (ML) models were uploaded to an open access repository in Github
Summary
One of the most common problems in time series analyses, such as rainfall datasets, is the presence of gaps of different widths, making this task harder to carry out This usually results from malfunctioning sensors or data loggers, lack of maintenance, meteorological events, or power outages. One of the most frequent algorithms to estimate missing rainfall records is the inverse distance weighting method (IDWM), where the estimated values are calculated with a weighted average (it resorts to the inverse of the distance when assigning the weights) from neighbor stations [8,9] Another simple method to apply is the gauge mean estimator, which uses an average value of observations from the nearby stations, which can be obtained by optimization, proximity metric, or correlation, among other techniques [10]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have