High-spatiotemporal-resolution and accurate soil moisture (SM) data are crucial for investigating climate, hydrology, and agriculture. Existing SM products do not yet meet the demands for high spatiotemporal resolution. The objective is to develop and evaluate a retrieval framework to derive SM estimates with high spatial (100 m) and temporal (<3 days) resolution that can be used on a national scale in China. Therefore, this study integrates multi-source data, including optical remote sensing (RS) data from Sentinel-2 and Landsat-7/8/9, synthetic aperture radar (SAR) data from Sentinel-1, and auxiliary data. Four machine learning and deep learning algorithms are applied, including Random Forest Regression (RFR), Extreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM) networks, and Ensemble Learning (EL). The integrated framework (IF) considers three feature scenarios (SC1: optical RS + auxiliary data, SC2: SAR + auxiliary data, SC3: optical RS + SAR + auxiliary data), encompassing a total of 33 features. The results are as follows: (1) The correlation coefficients (r) between auxiliary data (such as sand fraction, r = −0.48; silt fraction, r = 0.47; and evapotranspiration, r = −0.42), SAR features (such as the backscatter coefficients for VV-pol (σvv0), r = 0.47), and optical RS features (such as Shortwave Infrared Band 2 (SWIR2) reflectance data from Sentinel-2 and Landsat-7/8/9, r = −0.39) with observed SM are significant. This indicates that multi-source data can provide complementary information for SM monitoring. (2) Compared to XGBoost and LSTM, RFR and EL demonstrate superior overall performance and are the preferred models for SM prediction. Their R2 for the training and test sets exceed 0.969 and 0.743, respectively, and their ubRMSE are below 0.022 and 0.063 m3/m3, respectively. (3) The SM prediction accuracy is highest for the scenario of optical + SAR + auxiliary data, followed by SAR + auxiliary data, and finally optical + auxiliary data. (4) With an increasing Normalized Difference Vegetation Index (NDVI) and SM values, the trained models exhibit a general decrease in prediction performance and accuracy. (5) In 2021 and 2022, without considering cloud cover, the IF theoretically achieved an SM revisit time of 1–3 days across 95.01% and 96.53% of China’s area, respectively. However, SC1 was able to achieve a revisit time of 1–3 days over 60.73% of China’s area in 2021 and 69.36% in 2022, while the area covered by SC2 and SC3 at this revisit time accounted for less than 1% of China’s total area. This study validates the effectiveness of combining multi-source RS data with auxiliary data in large-scale SM monitoring and provides new methods for improving SM retrieval accuracy and spatiotemporal coverage.
Read full abstract