Abstract
Crop yield forecasting depends on many interactive factors, including crop genotype, weather, soil, and management practices. This study analyzes the performance of machine learning and deep learning methods for winter wheat yield prediction using an extensive dataset of weather, soil, and crop phenology variables in 271 counties across Germany from 1999 to 2019. We proposed a Convolutional Neural Network (CNN) model, which uses a 1-dimensional convolution operation to capture the time dependencies of environmental variables. We used eight supervised machine learning models as baselines and evaluated their predictive performance using RMSE, MAE, and correlation coefficient metrics to benchmark the yield prediction results. Our findings suggested that nonlinear models such as the proposed CNN, Deep Neural Network (DNN), and XGBoost were more effective in understanding the relationship between the crop yield and input data compared to the linear models. Our proposed CNN model outperformed all other baseline models used for winter wheat yield prediction (7 to 14% lower RMSE, 3 to 15% lower MAE, and 4 to 50% higher correlation coefficient than the best performing baseline across test data). We aggregated soil moisture and meteorological features at the weekly resolution to address the seasonality of the data. We also moved beyond prediction and interpreted the outputs of our proposed CNN model using SHAP and force plots which provided key insights in explaining the yield prediction results (importance of variables by time). We found DUL, wind speed at week ten, and radiation amount at week seven as the most critical features in winter wheat yield prediction.
Highlights
Crop yield forecasting depends on many interactive factors, including crop genotype, weather, soil, and management practices
We used eight supervised machine learning models, including K‐Nearest Neighbor (KNN), Random Forest, XGBoost, Regression Tree, Lasso and Ridge Regressions, Support vector regression (SVR), and Deep Neural Network (DNN), as baselines to compare the performance of our proposed model with them
Based on the chosen validation metrics (RMSE, mean absolute error (MAE), and correlation coefficient), the results of our analysis showed that nonlinear models such as DNN, Convolutional Neural Network (CNN), and XGBoost outperform linear models to a varying extent because crop yield is a highly complicated feature that depends on numerous interactive factors such as genotype and environmental conditions
Summary
Crop yield forecasting depends on many interactive factors, including crop genotype, weather, soil, and management practices. Agricultural challenges aggravated by climate change and the increased food demand driven by the ever-growing population (expected to reach 9 billion by 2030)[1], highlight the importance of having a timely crop yield prediction model at a regional scale that can be used for better managing crops, ensuring food security, and improving policy and agricultural decision making For this reason, a variety of approaches, ranging from process-based models to data-driven statistical algorithms, have been developed and applied to predict crop yield. This study aims to (1) benchmark the winter wheat yield prediction in Germany using state-of-the-art supervised ML algorithms; (2) propose a CNN-based architecture with a 1-D convolution operation to outperform other baselines in terms of accuracy; and (3) evaluate the effects of weather, soil and crop phenology variables on the prediction results and study the critical ranges of each feature on increasing/decreasing the yield output across different times of the year. The calculations were based on CORINE Land Cover 2006 land-use data available at a resolution of 250 m 26.The meteorological variables used in our analyses are wind speed, maximum and minimum temperature, relative humidity, precipitation, and solar radiation
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have