Abstract

BackgroundMost studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods.MethodsIn an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values.ResultsWhen the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction.ConclusionsLinear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-014-0075-3) contains supplementary material, which is available to authorized users.

Highlights

  • Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models

  • These results indicate that genomic best linear unbiased prediction (GBLUP), RRPCA, Multi-trait genome-enabled best linear unbiased prediction (MTGBLUP) and RBFPCA gave reasonable results in terms of bias, as long as the evaluated line or a closely related line was included in the training dataset

  • When only information from a closely related line was used for training, the linear models and the non-linear radial basis function (RBF) models had similar performance, indicating that the strong assumptions of the linear models may at least partly hold for the closely related lines used in our study

Read more

Summary

Introduction

Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Genomic estimated breeding values (GEBV) are generally predicted by a regression model [1] trained by a set of animals with known phenotypes and genotypes for a dense marker panel that covers the genome [2]. Prediction accuracy of such models depends on several factors, among which size of the set of training animals is most important, which has been addressed in several studies [2,3] that consistently claim that the biggest limitation for. A few studies have proposed to use multi-trait linear models [14,15,16], where trait by line combinations are modelled as separate but correlated traits, to try to accommodate these issues

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call