The advancements in deep learning methods have presented new opportunities and challenges for predicting land surface variables (LSVs) due to their similarity with computer sciences tasks. However, few researchers focus on the benchmark datasets for LSVs predictions that hampers fair comparisons of different data-driven deep learning models. Hence, we propose a LSVs benchmark dataset and prediction toolbox to boost research in data-driven LSVs modeling and improve the consistency of data-driven deep learning models for LSVs. LSVs benchmark dataset contains a large number of hydrology-related variables, such as global soil moisture, runoff, etc., which can verify the simulation of hydrological processes. Various global data from European Centre for Medium-Range Weather Forecasts reanalysis 5 (ERA5), ERA5-land, global gridded soil information (SoilGrid), soil moisture storage capacity (SMSC), and moderate-resolution imaging spectroradiometer (MODIS) datasets have been pre-processed into daily data at 0.5-, 1-, 2-, and 4-degree resolutions to facilitate their use in data-driven models. Simple statistical metrics, i.e., the root mean squared error and correlation coefficient, are chosen to evaluate the performance of different deep learning (DL) models, including convolutional neural network, long short-term memory and convolution long short-term memory models, with lead times of 1 and 5 days. A processed-based model serves as a physic baseline, soil moisture and surface sensible heat fluxes are taken as the target variables. The developed benchmark dataset and evaluation metrics for predicting LSVs using data-driven approaches, named as the LandBench toolbox, were implemented using Pytorch. This toolbox facilitates the reimplementation of existing methods, the development of novel predictive models, and the utilization of unified evaluation metrics. Additionally, the toolbox incorporates address mapping technology to enable high-resolution global predictions with constrained computing resources. We hope LandBench will not only serves as a standardized framework, fostering equitable model comparisons, but also provides indispensable data and a robust scientific foundation essential for advancing climate change research, disaster management, and sustainable development initiatives.
Read full abstract