Soybeans are an important crop used for food, oil, and feed. However, China's soybean self-sufficiency is highly inadequate, with an annual import volume exceeding 80%. RGB cameras serve as powerful tools for estimating crop yield, and machine learning is a practical method based on various features, providing improved yield predictions. However, selecting different input parameters and models, specifically optimal features and model effects, significantly influences soybean yield prediction. This study used an RGB camera to capture soybean canopy images from both the side and top perspectives during the R6 stage (pod filling stage) for 240 soybean varieties (a natural population formed by four provinces in China: Sichuan, Yunnan, Chongqing, and Guizhou). From these images, the morphological, color, and textural features of the soybeans were extracted. Subsequently, feature selection was performed on the image parameters using a Pearson correlation coefficient threshold ≥0.5. Five machine learning methods, namely, CatBoost, LightGBM, RF, GBDT, and MLP, were employed to establish soybean yield estimation models based on the individual and combined image parameters from the two perspectives extracted from RGB images. (1) GBDT is the optimal model for predicting soybean yield, with a test set R2 value of 0.82, an RMSE of 1.99 g/plant, and an MAE of 3.12%. (2) The fusion of multiangle and multitype indicators is conducive to improving soybean yield prediction accuracy. Therefore, this combination of parameters extracted from RGB images via machine learning has great potential for estimating soybean yield, providing a theoretical basis and technical support for accelerating the soybean breeding process.
Read full abstract