Abstract
Agricultural nitrous oxide (N2O) emission accounts for a non-trivial fraction of global greenhouse gases (GHGs) budget. To date, estimating N2O fluxes from cropland remains a challenging task because the related microbial processes (e.g., nitrification and denitrification) are controlled by complex interactions among climate, soil, plant and human activities. Existing approaches such as process-based (PB) models have well-known limitations due to insufficient representations of the processes or constraints of model parameters, and to leverage recent advances in machine learning (ML) new method is needed to unlock the “black box” to overcome its limitations due to low interpretability, out-of-sample failure and massive data demand. In this study, we developed a first of its kind knowledge-guided machine learning model for agroecosystems (KGML-ag), by incorporating biogeophysical/chemical domain knowledge from an advanced PB model, ecosys, and tested it by simulating daily N2O fluxes with real observed data from mesocosm experiments. The Gated Recurrent Unit (GRU) was used as the basis to build the model structure. To optimize the model performance, we have investigated a range of ideas, including: 1) Using initials of intermediate variables (IMVs) instead of time series as model input to reduce data demand; 2) Building hierarchical structures to explicitly estimate IMVs for further N2O prediction; 3) Using multitask learning to balance the simultaneous training on multiple variables; and 4) Pretraining with millions of synthetic data generated from ecosys and fine tuning with mesocosm observations. Six other pure ML models were developed using the same mesocosm data to serve as the benchmark for the KGML-ag model. Results show that KGML-ag did an excellent job in reproducing the mesocosm N2O fluxes (overall r2 = 0.81, and RMSE = 3.6 mg N m−2 day−1 from cross-validation). Importantly KGML-ag always outperforms the PB model and ML models in predicting N2O fluxes, especially for complex temporal dynamics and emission peaks. Besides, KGML-ag goes beyond the pure ML models by providing more interpretable predictions as well as pinpointing desired new knowledge and data to further empower the current KGML-ag. We believe the KGML-ag development in this study will stimulate a new body of research on interpretable ML for biogeochemistry and other related geoscience processes.
Highlights
55 Process-based (PB) models are often used for simulating N2O fluxes from the agroecosystem, but they have some inherent limitations, including incomplete knowledge of the processes, low accuracy due to the under-constrained parameters, expensive computing cost, and rigid structure for further improvements, that we could not resolve by using PB model itself
3.1 Pretraining experiments using synthetic data from ecosys In the pretraining stage, the Gated Recurrent Unit (GRU) model with 76 IMVs achieved the best performance in predicting N2O fluxes (r2=0.98, root mean square error (RMSE) =0.54 mg N m-2 day-1 and normalized RMSE (NRMSE) = 0.01) on the test set of synthetic data generated from 320 ecosys (Table 1)
5 Conclusions In this study, two KGML-ag models have been developed, validated, and tested for agricultural soil N2O flux prediction using synthetic data generated by the PB model ecosys and observational data from a mesocosm facility
Summary
Nitrous oxide (N2O), with its global warming potential 273 ± 118 times greater than that of carbon dioxide (CO2) for a 10045 year time horizon, is one of the important greenhouse gases (IPCC6; Forster et al, 2021). N2O is intimately connected with the soil organic carbon (SOC) dynamics, because soil nitrifiers and denitrifiers interact strongly with aerobic and anaerobic heterotrophs that process SOC evolution, and all of these microbes are driven by 70 shared environmental variables including soil temperature, moisture, redox status, and physical and chemical properties (Thornley et al, 2007). As expected, these connections make it difficult for PB models, even the most advanced ones like ecosys, to find sufficient representations of the physical and biogeochemical processes or obtain enough data to calibrate a large number of model parameters with strong spatio-temporal variations. Combining the power of ML model and PB model understanding innovatively is likely a path forward
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.