Abstract
One objective of this study was to provide readers with a clear and unified understanding of parametric statistical and kernel methods, used for genomic prediction, and to compare some of these in the context of rice breeding for quantitative traits. Furthermore, another objective was to provide a simple and user-friendly R package, named KRMM, which allows users to perform RKHS regression with several kernels. After introducing the concept of regularized empirical risk minimization, the connections between well-known parametric and kernel methods such as Ridge regression [i.e., genomic best linear unbiased predictor (GBLUP)] and reproducing kernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulated so as to show and emphasize the advantage of the kernel “trick” concept, exploited by kernel methods in the context of epistatic genetic architectures, over parametric frameworks used by conventional methods. Some parametric and kernel methods; least absolute shrinkage and selection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHS regression were thereupon compared for their genomic predictive ability in the context of rice breeding using three real data sets. Among the compared methods, RKHS regression and SVR were often the most accurate methods for prediction followed by GBLUP and LASSO. An R function which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression, with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time has been developed. Moreover, a modified version of this function, which allows users to tune kernels for RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.
Highlights
Since the seminal contribution of Meuwissen et al (2001), genomic selection (GS) has become a popular strategy for genetic improvement of livestock species and plants
Among all the compared methods, reproducing kernel Hilbert space (RKHS) regression and support vector machine regression (SVR) were regularly the most accurate methods for prediction followed by genomic best linear unbiased predictor (GBLUP) and least absolute shrinkage and selection operator (LASSO)
LASSO was often the least accurate method for prediction. This can be explained by the fact that, for situations where p > n, the predictive performance of LASSO is regularly dominated by Ridge regression (i.e., GBLUP) when covariates are highly correlated (Tibshirani, 1996; Zou and Hastie, 2005)
Summary
Since the seminal contribution of Meuwissen et al (2001), genomic selection (GS) has become a popular strategy for genetic improvement of livestock species and plants. Later Howard et al (2014) showed that RKHS and support vector machine regression (SVR), when dealing with an additive genetic architecture, could be almost as competitive as parametric methods such as best linear unbiased predictor (BLUP), least absolute shrinkage and selection operator (LASSO) or Bayesian linear regressions (Bayes A, Bayes B, Bayes C, Bayes Cπ, and Bayesian LASSO). These authors showed that RKHS regression and SVR, with some other non parametric methods, clearly outperformed parametric methods for an epistatic genetic architecture
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.