Abstract

Although many multi-source precipitation products (MSPs) with high spatio-temporal resolution have been extensively used in water cycle research, they are still subject to considerable uncertainties due to the spatial variability of terrain. Effective detection of precipitation occurrence is the key to enhancing precipitation accuracy. This study presents a two-step merging strategy to incorporate MSPs (GSMaP, IMERG, TMPA, PERSIANN-CDR, CMORPH, CHIRPS, and ERA-Interim) and rain gauges to improve the precipitation capture capacity and precipitation intensity simultaneously during 2000–2017 over China. Multiple environment variables and the spatial autocorrelation between precipitation observations are selected as auxiliary variables in the merging process. Three machine learning (ML) classification and regression models, including gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), and random forest (RF), are adopted and compared. The strategy first employs classification models to identify wet and dry days in warm and cold seasons, then combines regression models to predict precipitation amounts based on wet days. The results are also compared with those of traditional methods, including multiple linear regression (MLR), ML regression models, and gauge-based Kriging interpolation. A total of 1680 (70 %) rain gauges are randomly chosen for model training and 692 (30 %) for performance evaluation. The results show that: (1) The multi-sources merged precipitation products (MSMPs) perform better than original MSPs in detecting precipitation occurrence under different intensities, followed by Kriging. The average Heidke Skill Score (HSS) of MSPs, Kriging, and MSMPs is 0.30–0.69, 0.71, 0.79–0.8, respectively. (2) The proposed method significantly alleviates the bias and deviation of original MSPs in temporal and spatial. The MSMPs strongly correlate with gauge observations with the CC of 0.85. Moreover, the modified Kling-Gupta efficiency (KGE) improves by 17 %–62 % (MSMPs: 0.74–0.76) compared with MSPs (0.34–0.65). (3) The spatial autocorrelation factor (KP) is the most important variable in models, which contributes considerably to improving the model accuracy. (4) The proposed method outperforms MLR and ML regression models, and XGBoost algorithm is more recommended for large-scale data merging owing to its high computational efficiency. This study provides a robust and reliable method to improve the performance of precipitation data with full consideration of multi-source information. This method could be applied globally and produce large-scale precipitation products if rain gauges are available.

Highlights

  • As one of the critical parameters of the natural water cycle, precipitation helps us realistically understand the interaction 35 between hydrological and climate systems

  • This study proposes a two-step merging strategy to simultaneously enhance the precipitation discrimination ability and precipitation intensity over China by incorporating multi-source precipitation products (MSPs) and relatively high-density rain gauges based on machine learning (ML) algorithms

  • The categorical metrics focus on analyzing the ability of products to capture precipitation events, including the probability of detection (POD), false 310 alarm ratio (FAR), critical success index (CSI), Precision, frequency bias (FB), Heidke Skill Score (HSS), and classification accuracy (Accuracy)

Read more

Summary

10 Abstract

Many multi-source precipitation products (MSPs) with high spatio-temporal resolution have been extensively used in water cycle research, they are still subject to considerable uncertainties due to the spatial variability of terrain. The results are compared with those of 20 traditional methods, including multiple linear regression (MLR), ML regression models, and gauge-based Kriging interpolation. The results show that: (1) The multi-sources merged precipitation products (MSMPs) perform better than original MSPs in detecting precipitation occurrence under different intensities, followed by Kriging. (4) The proposed method outperforms MLR and ML regression models, and XGBoost algorithm is more recommended for large-scale data merging owing to its high 30 computational efficiency. This study provides a robust and reliable method to improve the performance of precipitation data with full consideration of multi-source information. This method could be applied globally and produce large-scale precipitation products if rain gauges are available

Introduction
Environment variables
Data preprocessing
A two-step merging strategy
XGBoost
Performance evaluation and comparison
Performance assessment for classification results
Performance assessment for regression results
Variable importance of ML models
Comparison of prediction accuracy of various merging approaches
Conclusion
Findings
585 References
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call