Abstract

BackgroundGenome-wide prediction has become the method of choice in animal and plant breeding. Prediction of breeding values and phenotypes are routinely performed using large genomic data sets with number of markers on the order of several thousands to millions. The number of evaluated individuals is usually smaller which results in problems where model sparsity is of major concern. The LASSO technique has proven to be very well-suited for sparse problems often providing excellent prediction accuracy. Several computationally efficient LASSO algorithms have been developed, but optimization of hyper-parameters can be demanding.ResultsWe have developed a novel automatic adaptive LASSO (AUTALASSO) based on the alternating direction method of multipliers (ADMM) optimization algorithm. The two major hyper-parameters of ADMM are the learning rate and the regularization factor. The learning rate is automatically tuned with line search and the regularization factor optimized using Golden section search. Results show that AUTALASSO provides superior prediction accuracy when evaluated on simulated and real bull data compared to the adaptive LASSO, LASSO and ridge regression implemented in the popular glmnet software.ConclusionsThe AUTALASSO provides a very flexible and computationally efficient approach to GWP, especially when it is important to obtain high prediction accuracy and genetic gain. The AUTALASSO also has the capability to perform GWAS of both additive and dominance effects with smaller prediction error than the ordinary LASSO.

Highlights

  • Genome-wide prediction has become the method of choice in animal and plant breeding

  • The purpose of this study is to introduce proximal algorithms, with a special focus on alternating direction method of multipliers (ADMM), into a Genome-wide prediction (GWP) framework, and to develop a general approach that automatically finds the optimal values of the learning rate and the regularization parameters of an adaptive LASSO

  • The AUTALASSO completed in 190 s and resulted in a MSEtest of 64.34 and rtest of 0.676

Read more

Summary

Introduction

Prediction of breeding values and phenotypes are routinely performed using large genomic data sets with number of markers on the order of several thousands to millions. Since the number of individuals is usually smaller, in the range of some hundreds to a few thousands, the result is a multivariate highdimensional statistical issue that is often referred to as the p >> n problem [4, 5]. Regularization is a mathematical technique to impose prior information on the structure of the solution to an optimization problem. It closely resembles the task of using priors in Bayesian statistics. It is well established that the LASSO usually results in better prediction accuracy than ridge regression if the predictors display low to moderate correlation between each other [4, 9, 10]

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.