Abstract

This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt. The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, investigate which combinations of hybrid models provide the most accurate predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction. Five ML models (linear regression, LASSO, LightGBM, random forest, and XGBoost) and six ensemble models have been designed to address the research question. The results suggest that adding simulation crop model variables (APSIM) as input features to ML models can decrease yield prediction root mean squared error (RMSE) from 7 to 20%. Furthermore, we investigated partial inclusion of APSIM features in the ML prediction models and we found soil moisture related APSIM variables are most influential on the ML predictions followed by crop-related and phenology-related variables. Finally, based on feature importance measure, it has been observed that simulated APSIM average drought stress and average water table depth during the growing season are the most important APSIM inputs to ML. This result indicates that weather information alone is not sufficient and ML models need more hydrological inputs to make improved yield predictions.

Highlights

  • This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt

  • Adding Agricultural Production Systems sIMulator (APSIM) variables as input features to ML models improved the performance of the 11 developed ML models

  • Comparing the lowest prediction errors (RMSE) of the benchmark and the hybrid scenario, we found that the use of hybrid models achieved 8%-9% better corn yield predictions

Read more

Summary

Introduction

This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt. Jiang et al.[25] devised a long short-term memory (LSTM) model that incorporates heterogeneous crop phenology, meteorology, and remote sensing data in predicting county-level corn yields This model outperformed LASSO and random forest and explain 76% of yield variations across the Corn Belt. The problem was formatted as a classification problem with the objective of labeling unseen observations’ agro-ecologies (highlands or lowlands) They found that Linear discriminant analysis (LDA) performed better than other trained models, including logistic regression, K-nearest neighbor, decision tree, naïve Bayes, and support vector machines (SVM), with prediction accuracy of 61%

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call