The most challenging task in the agricultural sector is to accurately predict crop yield. A typical machine learning algorithm often uses real data to predict crop yield. In this study, we used data generated by the Wild Blueberry Pollination Model, a spatially explicit simulation model validated by field observation and experimental data collected in Maine USA during the last 30 years. The main aim of this study is to evaluate the relative importance of bee species composition and weather factors in regulating wild blueberry agroecosystems. Specifically, we sought to reveal how bee species composition and weather affect yield and to predict optimal bee species composition and weather conditions that achieve the best yield using computer simulation and machine learning algorithms. Multiple linear regression (MLR), boosted decision trees (BDT), random forest (RF), and extreme gradient boosting (XGBoost) were evaluated as predictive tools. We also performed a predictor selection before submitting our data to the learning algorithms. In this way, we are able to reduce the dimension of the input without a significant drop in prediction accuracy. As a result, clone size, honeybee, bumblebee, Andrena bee species, Osmia bee species, maximum of upper-temperature ranges, and the number of days with precipitation were chosen as the best predictor variable subset. The results showed that the XGBoost outperformed other algorithms in all measures of model performance for predicting the yield of wild blueberry by achieving a coefficient of determination (R2) of 0.938, root mean square error (RMSE) of 343.026, mean absolute error (MAE) of 206 and relative root mean square error (RRMSE) of 5.444%. The results are consistent with previous work on predicting wild blueberry fruit yield using digital color photography by (Zaman et al., 2008). This study showed that crop yield predictions can be based on computer simulation modeling datasets. Therefore, if a reasonable prediction can be reached, this study should have a significant impact, especially when data collection in the field is challenging.
Read full abstract