Abstract

Genomic selection (GS) is a strategy to predict the genetic merits of individuals using genome-wide markers. However, GS prediction accuracy is affected by many factors, including missing rate and minor allele frequency (MAF) of genotypic data, GS models, trait features, etc. In this study, we used one wheat population to investigate prediction accuracies of various GS models on yield and yield-related traits from various quality control (QC) scenarios, missing genotype imputation, and genome-wide association studies (GWAS)-derived markers. Missing rate and MAF of single nucleotide polymorphism (SNP) markers were two major factors in QC. Five missing rate levels (0%, 20%, 40%, 60%, and 80%) and three MAF levels (0%, 5%, and 10%) were considered and the five-fold cross validation was used to estimate the prediction accuracy. The results indicated that a moderate missing rate level (20% to 40%) and MAF (5%) threshold provided better prediction accuracy. Under this QC scenario, prediction accuracies were further calculated for imputed and GWAS-derived markers. It was observed that the accuracies of the six traits were related to their heritability and genetic architecture, as well as the GS prediction model. Moore–Penrose generalized inverse (GenInv), ridge regression (RidgeReg), and random forest (RForest) resulted in higher prediction accuracies than other GS models across traits. Imputation of missing genotypic data had marginal effect on prediction accuracy, while GWAS-derived markers improved the prediction accuracy in most cases. These results demonstrate that QC on missing rate and MAF had positive impact on the predictability of GS models. We failed to identify one single combination of QC scenarios that could outperform the others for all traits and GS models. However, the balance between marker number and marker quality is important for the deployment of GS in wheat breeding. GWAS is able to select markers which are mostly related to traits, and therefore can be used to improve the prediction accuracy of GS.

Highlights

  • Wheat (Triticum aestivum L.) is one of the major cultivated crops that is growing on approximately 200 million hectares worldwide and delivers one fifth of the total caloric demands of the global population [1]

  • Previous efforts to compare the predictive ability of various Genomic selection (GS) models in wheat showed the good performances of RF and reproducing kernel Hilbert space (RKHS) for traits of interest, but no single GS model outperformed the other models in all cases [9,10]

  • Our results revealed that quality control (QC) for missing rate and minor allele frequency (MAF) affected genome coverage (Table S2)

Read more

Summary

Introduction

Wheat (Triticum aestivum L.) is one of the major cultivated crops that is growing on approximately 200 million hectares worldwide and delivers one fifth of the total caloric demands of the global population [1]. The trained model is used to predict genomic estimated breeding values (GEBVs) in a validating population (VP), which is only genotyped. Whole-genome regression methods based on ordinary least squares cannot estimate all marker effects simultaneously due to insufficient degrees of freedom To address this issue, various classical statistical, Bayesian, and machine learning methods have been proposed for predicting the genetic merits of individuals [7]. Various classical statistical, Bayesian, and machine learning methods have been proposed for predicting the genetic merits of individuals [7] These methods differ from each other mainly by a range of assumptions in the estimation of breeding values and variances in quantitative traits and computational complexity [2,7]. Previous efforts to compare the predictive ability of various GS models in wheat showed the good performances of RF and RKHS for traits of interest, but no single GS model outperformed the other models in all cases [9,10]

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call