Abstract
Genomic selection (GS) is a predictive methodology that is revolutionizing plant and animal breeding. However, the practical application of the GS methodology is challenging since a successful implementation requires a good identification of the best lines. For this reason, some approaches have been proposed to be able to select the top (or bottom) lines with more Precision. Despite the varying popularity of methods, with some being notably more efficient than others, this paper delves into the fundamentals of these techniques. We used five models/methods: (1) RC, known as the Bayesian Best Linear Unbiased Predictor (GBLUP); (2) R, which is like RC but uses a threshold; (3) RO, Regression Optimum, that leverages the RC model in its training process to fine-tune the threshold; (4) B, Threshold Bayesian Probit Binary model (TGBLUP) with a threshold of 0.5 to classify the cultivars as top or non-top; (5) BO is the TGBLUP but the threshold used is an optimal probability threshold that guarantees similar Sensitivity and Specificity. We also present a benchmark comparison of existing approaches for selecting the top (or bottom) performers, utilizing five real datasets for comprehensive analysis. For methods that necessitate a rigorous tuning process, we suggest a streamlined tuning approach that significantly decreases implementation time without notably compromising performance. Our analysis revealed that the regression optimal (RO) method outperformed other models across the five real datasets, achieving superior results in terms of the F1 score. Specifically, RO was more effective than models R, B, RC, and BO by 60.87, 42.37, 17.63, and 9.62%, respectively. When looking at the Kappa coefficient, the RO model was better than models B, BO, R, and RC by 37.46, 36.21, 52.18, and 3.95%, respectively. In terms of Sensitivity, the RO model outperformed models B, R, and RC by 145.74, 250.41, and 86.20, respectively. The second-best model was the model BO. It is important to point out that in the first stage, the BO and RO approaches train a classification and regression model, respectively, to classify the lines as the top (bottom) or not the top (not the bottom). However, both the BO and RO approaches optimize a threshold in the second stage to perform the classification of the lines that minimize the difference between the Sensitivity and Specificity. The BO and RO methods are superior for the selection of the top (or bottom) lines. For this reason, we encourage breeders to adopt these approaches to increase genetic gain in plant breeding programs.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have