Predictive ability of genome-assisted statistical models under various forms of gene action

Mehdi Momen,Ahmad Ayatollahi Mehrgardi,Gota Morota,Andreas Kranis,Ayyub Sheikhi,Llibertat Tusell,Guilherme J. M. Rosa,Daniel Gianola

doi:10.1038/s41598-018-30089-2

Abstract

Recent work has suggested that the performance of prediction models for complex traits may depend on the architecture of the target traits. Here we compared several prediction models with respect to their ability of predicting phenotypes under various statistical architectures of gene action: (1) purely additive, (2) additive and dominance, (3) additive, dominance, and two-locus epistasis, and (4) purely epistatic settings. Simulation and a real chicken dataset were used. Fourteen prediction models were compared: BayesA, BayesB, BayesC, Bayesian LASSO, Bayesian ridge regression, elastic net, genomic best linear unbiased prediction, a Gaussian process, LASSO, random forests, reproducing kernel Hilbert spaces regression, ridge regression (best linear unbiased prediction), relevance vector machines, and support vector machines. When the trait was under additive gene action, the parametric prediction models outperformed non-parametric ones. Conversely, when the trait was under epistatic gene action, the non-parametric prediction models provided more accurate predictions. Thus, prediction models must be selected according to the most probably underlying architecture of traits. In the chicken dataset examined, most models had similar prediction performance. Our results corroborate the view that there is no universally best prediction models, and that the development of robust prediction models is an important research objective.

Highlights

The effectiveness of genomic prediction depends on the accuracy of estimation of the genetic value of individuals with yet-to-be observed phenotypes[1]
Real data offer the advantage of reflecting true complexity, whereas simulation allows ones to explore the impact on predictive performance of factors such as statistical genetic architecture of the trait, number of markers used for the analysis, and degree of relatedness between training and prediction populations[4]
The highest predictive and empirical accuracies were consistently obtained under Ad (0.56 and 0.90, respectively), in which genetic values of individuals were only influenced by additive quantitative trait loci (QTL) effects

Summary

Introduction

The effectiveness of genomic prediction depends on the accuracy of estimation of the genetic value of individuals with yet-to-be observed phenotypes[1]. Various factors affect the accuracy of estimated genomic breeding values (GEBVs) and, the expected response to genomic selection These include the model performance, training and testing sample sizes, relatedness between individuals in training and testing sets, marker density, and the statistical genetic architecture of target traits, i.e., the extent and distribution of linkage disequilibrium between markers and quantitative trait loci (QTL), number of QTLs, allelic frequencies and magnitude of QTL effects, and trait heritability[2,3]. Howard et al.[12] compared 14 genomic prediction models with 2000 biallelic markers by simulating two complex traits (explaining either 30% or 70% of the phenotypic variability) in a F2 and a backcross (BC) populations derived from crosses of inbred lines They concluded that the parametric models predicted phenotypic values worse than those of non-parametric models when the gene action was epistasis. Predictive accuracy of the all models was assessed with a real chicken dataset

Objectives

Methods

Results

Conclusion