Comparison of Root Zone Soil Moisture Data Fusion Using Machine Learning, Triple Collocation, and Three-Cornered Hat Methods

Jing Tian,Yongqiang Zhang

doi:10.5194/egusphere-egu24-7223

Abstract

Root zone soil moisture (RZSM) serves as a crucial metric for assessing water stored in the soil. Modeling approaches are commonly employed in estimating RZSM. However, modelled RZSM often deviate from true RZSM values due to errors from model input data and parameters. Machine learning methods and data fusion techniques can enhance simulation accuracy. In this study, we conducted a comparative analysis of three methods for RZSM data fusion: random forest (RF), extended triple collocation (ETC), and Bayes Three Cornered Hat (BTCH).Soil moisture observation data from 2018 to 2022 were collected at 2121 sites across China from the China Meteorological Administration (Fig.1). Daily average data were calculated by arithmetically averaging hourly data and used in the analysis. Six RZSM datasets were utilized, including SMAP Level 4, GLDAS-NOAH2.1, GLDAS-Catchment2.2, ERA5, MERRA2, and CRSR. All these data were resampled to 0.25&#176; to maintain the same spatial resolution and were arithmetically averaged as daily averages. Additionally, some parameters related to soil, climate, and vegetation were used to build a machine learning model, specifically a random forest model.&#160;Fig. 1 Distribution of soil moisture sites and daily soil moisture (m3/m3) at depths ranging from 0&#8211;50 cm across China during the period from 2018 to 2019To investigate the impact of different inputs on the performance of the RF method, three groups of inputs were employed. The specifics of the inputs used for the three methods are outlined in Table 1. The evaluation of the RF method results was carried out using a five-fold cross-validation approach.Model Inputs RFmodel1 NOAH, SMAP, ERA5, MERRA2, CFSR, CLSM, LAI, Soil properties, Meteorological data RFmodel2 NOAH, LAI, Soil properties, Meteorological data RFmodel3 NOAH, SMAP, ERA5, MERRA2, CFSR, CLSM BTCH NOAH, SMAP, ERA5, MERRA2, CFSR, CLSM ETC NOAH, MERRA2, CLSM &#160;The boxplots show RFmodel1 performs best, emphasizing the need for comprehensive information in machine learning models. RFmodel2, superior to RFmodel3, highlights the significance of LAI, soil properties, and meteorological data in RZSM estimation. ETC and BTCH outperform individual RZSM datasets, especially in the absence of true data. The superior performance of ETC over BTCH is attributed to ETC's inputs, namely NOAH, MERRA2, and CLSM, which exhibit better accuracy compared to SMAP, ERA5, and CFSR, the inputs used by BTCH.Fig.2 Boxplots of the Pearson coefficient (R), Root Mean Square Error (RMSE), and bias between in situ root zone soil moisture (RZSM) and its estimates from the three random forest models, Bayes Three Cornered Hat (BTCH), and Extended Triple Collocation (ETC) methodsIn summary, the random forest method outperforms BTCH and ETC in the fusion of root zone soil moisture (RZSM) data, highlighting the importance of including leaf area index (LAI), soil properties, and meteorological data in the construction of the random forest model. Both BTCH and ETC demonstrate utility in enhancing RZSM estimates, making them valuable options when true data is unavailable.

Full Text