The effect of different approaches to determining the regularization parameter of bayesian LASSO on the accuracy of genomic prediction.

Hamid Sahebalam,Mohsen Gholizadeh,Seyed Hassan Hafezian

doi:10.1007/s00335-024-10088-7

Abstract

Using dense genomic markers opens up new opportunities and challenges for breeding programs. The need to penalize marker-specific regression coefficients becomes particularly important when dense markers are available. Therefore, fitting the marker effects to observations using a regularization technique, such as Bayesian LASSO (BL) regression, is of greatinteresting. When the Laplace prior distribution is applied to the regression coefficients, BL can be interpreted as a regularization of the norm based on the Bayesian approach. A critical issue is the appropriate selection of hyperparameters values in the prior distributions of regularization techniques, asthese values essentiallycontrol the sparsity in the estimated model. The purpose of this study was toevaluate different approaches for selecting the regularization parameter in BL, based on fully Bayesian approaches-such as gamma prior (BL_Gamma), beta prior (BL_Beta) and fixed prior (BL_Fixed) as well as data-driven approaches like cross-validation based on mean square error (BL_CV_MSE) and prediction accuracy (BL_CV_PA). Additionally, information-criteria-based methods including Akaike's information criterion (BL_AIC), Bayesian information criterion (BL_BIC) and Deviance information criterion (BL_DIC), were explored. For this purpose, a genome containing eight chromosomes (each 1 Morgan in length) with 100 randomly distributed quantitative trait loci was simulated. The studied scenarios were as follows: Scenario 1involved4000 markers and heritability of 0.2, scenario 2involved 4000 markers and heritability of 0.6, scenario 3involved 16,000 markers and heritability of 0.2; and scenario 4involved16,000 markers and heritability of 0.6. The results showed that among thefully Bayesian and cross-validation approaches, BL_Gamma, BL_Beta, and BL_CV_MSE provided the highest prediction accuracy (PA) in scenario 1 and 3. With increased marker density and heritability (scenario 4), the cross-validation approaches performed slightly better. The information-criteria-based methods demonstrated the lowest PA.Increasing heritability and marker density led to a decrease and an increase in the model penalty on the regression coefficients, respectively. The PA obtained in the target population ranged from 0.210 to 0.413in Scenario 1, 0.402to0.600in Scenario 2, 0.256to0.442in Scenario 3, and 0.478to 0.653 in Scenario 4.In generally, fully Bayesian approaches based on random priors for the regularization parameter are recommended for BL, as they provide acceptable PA with lower computational loads.

Full Text