AbstractAccurate precipitation records are an essential component when monitoring the climate and studying its changes. However, analysis is typically limited by the large quantities of missing values present. This article proposes two new imputation techniques for incomplete monthly data collected from a rainfall monitoring network in the Republic of Ireland from 1981 to 2010. The data considered is high‐dimensional due to the large number of over 1100 rain gauge stations present, and the methods presented are designed to handle such cases. These are Elastic‐Net Chained Equations (ENCE) and Multiple Imputation by Chained Equations with Direct use of Regularized Regression by elastic‐net (MICE DURR). Both methods predict missing data by a series of regularized regression models, where MICE DURR differs from ENCE by also using multiple imputation. Through various evaluations across different levels of missingness, ENCE and MICE DURR consistently outperformed existing imputation methods in terms of RMSE and . Moreover, they have provided the best results both seasonally and for accurately predicting extreme values. An RMSE of 14.16 and 14.17 mm per month were reported for ENCE and MICE DURR, respectively, when stations that were at least 50% complete during the study period were included. For increasingly sparser data, the imputation accuracy achieved from MICE DURR surpasses ENCE, demonstrating the efficacy of multiple imputation when handling a substantial amount of missing data. Validation metrics indicate that these methods compare very favourably to existing methods in the literature, such as those that use random forests or multiple linear regression.
Read full abstract