Introduction: Machine learning methods combined with satellite imagery have the potential to improve estimates of carbon uptake of terrestrial ecosystems, including croplands. Studying carbon uptake patterns across the U.S. using research networks, like the Long-Term Agroecosystem Research (LTAR) network, can allow for the study of broader trends in crop productivity and sustainability.Methods: In this study, gross primary productivity (GPP) estimates from the Moderate Resolution Imaging Spectroradiometer (MODIS) for three LTAR cropland sites were integrated for use in a machine learning modeling effort. They are Kellogg Biological Station (KBS, 2 towers and 20 site-years), Upper Mississippi River Basin (UMRB - Rosemount, 1 tower and 12 site-years), and Platte River High Plains Aquifer (PRHPA, 3 towers and 52 site-years). All sites were planted to maize (Zea mays L.) and soybean (Glycine max L.). The MODIS GPP product was initially compared to in-situ measurements from Eddy Covariance (EC) instruments at each site and then to all sites combined. Next, machine learning algorithms were used to create refined GPP estimates using air temperature, precipitation, crop type (maize or soybean), agroecosystem, and the MODIS GPP product as inputs. The AutoML program in the h2o package tested a variety of individual and combined algorithms, including Gradient Boosting Machines (GBM), eXtreme Gradient Boosting Models (XGBoost), and Stacked Ensemble.Results and discussion: The coefficient of determination (r2) of the raw comparison (MODIS GPP to EC GPP) was 0.38, prior to machine learning model incorporation. The optimal model for simulating GPP across all sites was a Stacked Ensemble type with a validated r2 value of 0.87, RMSE of 2.62 units, and MAE of 1.59. The machine learning methodology was able to successfully simulate GPP across three agroecosystems and two crops.
Read full abstract