Accurate crop yield prediction is critical for ensuring food security and efficient agricultural management, particu- larly in the face of climate change and rising global populations. Current predictive models often fall short in generalizing across diverse agricultural contexts due to their inability to capture complex interactions between various climatic and soil variables effectively. This study addresses these gaps by proposing a com- prehensive machine-learning framework that integrates ensemble methods to enhance crop yield prediction accuracy. Using a dataset enriched with climatic and agricultural features, we evaluated multiple models, including Linear Regression, Decision Tree, Random Forest, Gradient Boosting, XGBoost, Bagging Regressor, and K-nearest neighbors. The Random Forest model emerged as the top performer, achieving an accuracy of 0.985 and a Mean Squared Error (MSE) of 1.08e+08. At the same time, the Bagging Regressor closely followed with an accuracy of 0.984 and comparable MSE. Gradient Boosting and XGBoost models also demonstrated robust performance, with accuracies ranging from 0.865 to 0.974 and MSE values between 9.60e+08 and 1.89e+08. Our approach includes extensive hyperparameter tuning and k-fold cross-validation to ensure model generalizability and robustness across agricultural scenarios. These findings highlight the effectiveness of ensemble methods in capturing complex data relationships and their superiority over traditional models in predicting crop yields. Our work sets the stage for future research into integrating real-time data and advanced hybrid models, aiming to refine predictive accuracy further and support sustainable agricultural practices.
Read full abstract