Abstract

Genomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5–17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant.

Highlights

  • Almost two decades have passed since Genomic Selection (GS) was first proposed by Meuwissen et al (2001)

  • Genomic prediction methods We considered four prediction models: genomic-BLUP (GBLUP) using additive genomic relationships (VanRaden 2008); Reproducing Kernel Hilbert Spaces (RKHS) regression (Gianola et al 2006; de los Campos et al 2010), which is equivalent to a GBLUP with a non-linear kernel; and sparse selection indices (SSI) obtained by imposing an L1-penalty on a selection index using additive genomic relationships (GSSI) and using a Gaussian kernel (KSSI)

  • The germplasm used in this study is derived from different biparental families across 4 years

Read more

Summary

Introduction

Almost two decades have passed since Genomic Selection (GS) was first proposed by Meuwissen et al (2001) This groundbreaking idea was quickly adopted for breeding dairy cattle (Hayes et al 2009), beef cattle (Garrick 2011), broilers (Wolc et al 2016), maize (Bernardo and Yu 2007), wheat (Poland et al 2012), and many other animal and crop species (Xu et al 2020). Investments by public and private organizations led to the development of large genomic data sets comprising DNA sequences and phenotypes. These large sample sizes of modern genomic data sets have increased our ability to accurately train high-dimensional genomic prediction models and methods (Howard et al 2019). Riedelsheimer et al (2013) and Jacobson et al (2014) reported that the prediction accuracy was higher when models were trained using data from biparental families that shared at least one parent, relative to training using data from all the available biparental families

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.