Abstract

The rapid development of molecular markers and sequencing technologies has made it possible to use genomic prediction (GP) and selection (GS) in animal and plant breeding. However, when the number of observations (n) is large (thousands or millions), computational difficulties when handling these large genomic kernel relationship matrices (inverting and decomposing) increase exponentially. This problem increases when genomic × environment interaction and multi-trait kernels are included in the model. In this research we propose selecting a small number of lines m(m < n) for constructing an approximate kernel of lower rank than the original and thus exponentially decreasing the required computing time. First, we describe the full genomic method for single environment (FGSE) with a covariance matrix (kernel) including all n lines. Second, we select m lines and approximate the original kernel for the single environment model (APSE). Similarly, but including main effects and G × E, we explain a full genomic method with genotype × environment model (FGGE), and including m lines, we approximated the kernel method with G × E (APGE). We applied the proposed method to two different wheat data sets of different sizes (n) using the standard linear kernel Genomic Best Linear Unbiased Predictor (GBLUP) and also using eigen value decomposition. In both data sets, we compared the prediction performance and computing time for FGSE versus APSE; we also compared FGGE versus APGE. Results showed a competitive prediction performance of the approximated methods with a significant reduction in computing time. Genomic prediction accuracy depends on the decay of the eigenvalues (amount of variance information loss) of the original kernel as well as on the size of the selected lines m.

Highlights

  • The behavior of the cycles is similar for Full Genomic Method Single-Environment Model (FGSE) and APSE for 4000 wheat lines for Km,m, Kn,m, but genomic-enabled prediction values are lost as the number of lines included in the

  • Using the approximate model APGE and only 25% of the total training set, that is, m = 9021, such that matrices Km,m, and Kn,m, are of manageable sizes of order 9021 × 9021 and 45099 × 9021, respectively, this gives a genomic prediction accuracy of 0.427, with a residual variance of 0.302, that is, there is no loss of genomic prediction accuracy with respect to the full genomic models with genotype × environment interactions (GE) (FGGE model 5)

  • The computing time required, including the time for preparing the matrices for the approximation method, and the time for the eigenvalue decomposition and the 20,000 iterations, was 30,670 seconds. This was very similar for the prediction of the other cycles, and the only differences were in the computing time consumed between FGGE and APGE; this difference exponentially increased with the total number of training data

Read more

Summary

Introduction

The rapid development of molecular markers and sequencing technologies has made it possible to use genomic prediction (GP) and selection (GS) in animal and plant breeding (Meuwissen et al, 2001), and practical evidence in plant and animal breeding data has shown that GS provides important prediction accuracy for GS-assisted breeding (Meuwissen et al, 2001; Crossa et al, 2010, 2011; de los Campos et al, 2010; Pérez-Rodríguez et al, 2012).Approximate Genome-Based Kernel ModelsAdditive genetic effects can be predicted directly from marker effects by Ridge Regression best linear unbiased prediction (rrBLUP) (Endelman, 2011) and/or by employing Bayesian inference (Meuwissen et al, 2001), and/or developing the genomic relationship linear kernel matrix (G) to fit the GBLUP (VanRaden, 2008). Departures from linearity can be assessed by semi-parametric approaches, such as mixed models with non-additive covariance structure defined in the Reproducing Kernel Hilbert Space (RKHS) framework or more complicated prediction methods such as neural networks (Gianola et al, 2006; Gianola and van Kaam, 2008; de los Campos et al, 2010; González-Camacho et al, 2012; Pérez-Rodríguez et al, 2012). Gianola et al (2006, 2014) suggested using RKHS regression for semi-parametric, genomicenabled prediction and pointed out that non-parametric methods such as kernel regression are necessary to reduce the dimension of the parametric space, and to be able to capture complex cryptic interaction among markers.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call