Abstract
Random k-fold cross-validation in genome-wide selection (GWS) can help to estimate predictive ability ($$ {r}_{y\hat{y}} $$). Predictive ability tends to be higher when training, and validation sets present a high degree of kinship. However, many tree breeding populations are less genetically related to the training sets and have different levels of phenotypic diversity. Therefore, this study proposes methods of splitting k-fold cross-validation sets to optimize $$ {r}_{y\hat{y}} $$ estimates that are consistent with the breeding population and verify the impact of phenotypic and genotypic distribution on GWS. Using a simulated Eucalyptus trait (h2=0.5) and Pinus taeda L. data for diameter at breast height (h2=0.31), six methods were developed based on mutual information (I) and entropy (H) for measuring genetic similarity and phenotypic dissimilarity, respectively. All methods were evaluated for $$ {r}_{y\hat{y}} $$, bias, minimum squared error of prediction, and genomic heritability. The Pearson correlations of these parameters with the kinship coefficient, and I and H between and within training and validation sets were also estimated. Our results show that closer genetic similarity did not significantly increase $$ {r}_{y\hat{y}} $$ and that a lower H reduced $$ {r}_{y\hat{y}} $$ and overestimated genomic breeding values. Consequently, phenotypic diversity (high H) should be added to tree breeding populations to increase genetic gain and reduce bias. The new methods accurately fitted models according to the entropy of tree breeding populations and their genetic relationship to the training sets. Therefore, these methods provided usable estimates of genetic gain to produce consistent success of long-term tree breeding programs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.