Application of machine learning and genetic optimization algorithms for modeling and optimizing soybean yield using its component traits.

Mohsen Yoosefzadeh-Najafabadi,Milad Eskandari,Dan Tulpan,Qiang Zeng

doi:10.1371/journal.pone.0250665

Mohsen Yoosefzadeh-Najafabadi, Milad Eskandari + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0250665

Copy DOI

Journal: PloS one	Publication Date: Apr 30, 2021
Citations: 36	License type: CC BY 4.0

Affiliation: University of Guelph

Abstract

Improving genetic yield potential in major food grade crops such as soybean (Glycine max L.) is the most sustainable way to address the growing global food demand and its security concerns. Yield is a complex trait and reliant on various related variables called yield components. In this study, the five most important yield component traits in soybean were measured using a panel of 250 genotypes grown in four environments. These traits were the number of nodes per plant (NP), number of non-reproductive nodes per plant (NRNP), number of reproductive nodes per plant (RNP), number of pods per plant (PP), and the ratio of number of pods to number of nodes per plant (P/N). These data were used for predicting the total soybean seed yield using the Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Random Forest (RF), machine learning (ML) algorithms, individually and collectively through an ensemble method based on bagging strategy (E-B). The RBF algorithm with highest Coefficient of Determination (R2) value of 0.81 and the lowest Mean Absolute Errors (MAE) and Root Mean Square Error (RMSE) values of 148.61 kg.ha-1, and 185.31 kg.ha-1, respectively, was the most accurate algorithm and, therefore, selected as the metaClassifier for the E-B algorithm. Using the E-B algorithm, we were able to increase the prediction accuracy by improving the values of R2, MAE, and RMSE by 0.1, 0.24 kg.ha-1, and 0.96 kg.ha-1, respectively. Furthermore, for the first time in this study, we allied the E-B with the genetic algorithm (GA) to model the optimum values of yield components in an ideotype genotype in which the yield is maximized. The results revealed a better understanding of the relationships between soybean yield and its components, which can be used for selecting parental lines and designing promising crosses for developing cultivars with improved genetic yield potential.

Highlights

The linear correlation between soybean seed yield and per plant (PP) (r = 0.71) was found to be the strongest followed by its correlation with nodes per plant (NP) (r = 0.68), reproductive nodes per plant (RNP) (r = 0.67), and P/N (r = 0.64)
Among all the tested machine learning (ML) algorithms, the R2 reached its maximum value of 0.81 in Radial Basis Function (RBF) taking into account PP, NP, and RNP
The main objective of this study was to evaluate the potential use of yield component traits for estimating final seed yield in soybean using different ML and ensemble method based on bagging strategy (E-B) algorithms, which in turn can be used by breeders for selecting parental lines and designing promising crosses for developing cultivars with improved genetic yield potential

Summary

Objectives

This study aimed to investigate the potential use of soybean yield components for predicting the final seed yield using individual ML algorithms as well as ensemble learning methods. One of the objectives of this study was to investigate the potential use of soybean yield components such as NP, PP, RNP, NRNP, and P/N for predicting the final seed yield production. The main objective of this study was to evaluate the potential use of yield component traits for estimating final seed yield in soybean using different ML and E-B algorithms, which in turn can be used by breeders for selecting parental lines and designing promising crosses for developing cultivars with improved genetic yield potential

Methods

Results

Discussion

Conclusion