Abstract

An established way for improving the accuracy of gridded satellite precipitation products is to “correct” them by exploiting ground-based precipitation measurements, together with machine and statistical learning algorithms. Such corrections are made in regression settings, where the ground-based measurements are the dependent variable and the satellite data are predictor variables. Comparisons of machine and statistical learning algorithms in the direction of obtaining the most useful precipitation datasets by performing such corrections are regularly conducted in the literature. Nonetheless, in most of these comparisons, a small number of machine and statistical learning algorithms are considered. Also, small geographical regions and limited time periods are examined. Thus, the results provided tend to be of local importance and to not offer more general guidance. To provide results that are generalizable, we compared eight state-of-the-art machine and statistical learning algorithms in correcting satellite precipitation data for the entire contiguous United States and for a 15-year period. We used monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) gridded dataset and the Global Historical Climatology Network monthly database, version 2 (GHCNm). Our results suggest that extreme gradient boosting (XGBoost) and random forests are more accurate than the remaining algorithms, which can be ordered as follows from the best to the worst ones: Bayesian regularized feed-forward neural networks, multivariate adaptive polynomial splines (poly-MARS), gradient boosting machines (gbm), multivariate adaptive regression splines (MARS), feed-forward neural networks, linear regression.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call