Bayesian model averaging to improve the yield prediction in wheat breeding trials

Shuaipeng Fei,Zhen Chen,Lei Li,Yuntao Ma,Yonggui Xiao

doi:10.1016/j.agrformet.2022.109237

Abstract

Accurate pre-harvest prediction of wheat yield through secondary traits helps to facilitate plant breeding and reduce costs. Machine learning (ML) algorithms are increasingly applied to grain yield with remote sensing data. However, the performance of individual ML algorithms varies for different species in different environments due to different sources of uncertainty. This study proposed a novel wheat yield prediction framework based on canopy hyperspectral reflectance (350–2500 nm) and adopted the ensemble Bayesian model averaging (EBMA) method to improve model performance. To develop the yield prediction models, important bands extracted by the Boruta feature selection method were fed into four linear ML models and four nonlinear ML models. Meanwhile, Bayesian model averaging (BMA) weights obtained based on model cross-validation performance were used to combine the predictions of individual ML models. Compared to the best-performing individual model, the EBMA models obtained a weak accuracy improvement by integrating only the linear models or the nonlinear models. Additionally, the integration of two linear models and two non-linear models simultaneously was analyzed. Results indicate that most EBMA combinations of mixed linear and non-linear models achieved higher prediction accuracy than those integrating a single type of model and the best-performing individual model. The advantage of the EBMA method is that it produces a prediction distribution that reflects the uncertainty associated with deterministic predictions. With full consideration of the model diversity of ensemble members, the EBMA modeling framework provides an alternative method for predicting grain yield in plant breeding trials.

Full Text