Modeling covariance structure based on genetic similarity between pairs of relatives plays an important role in evolutionary, quantitative and statistical genetics. Historically, genetic similarity between individuals has been quantified from pedigrees via the probability that randomly chosen homologous alleles between individuals are identical by descent (IBD). At present, however, many genetic analyses rely on molecular markers, with realized measures of genomic similarity replacing IBD-based expected similarities. Animal and plant breeders, for example, now employ marker-based genomic relationship matrices between individuals in prediction models and in estimation of genome-based heritability coefficients. Phenotypes convey information about genetic similarity as well. For instance, if phenotypic values are at least partially the result of the action of quantitative trait loci, one would expect the former to inform about the latter, as in genome-wide association studies. Statistically, a non-trivial conditional distribution of unknown genetic similarities, given phenotypes, is to be expected. A Bayesian formalism is presented here that applies to whole-genome regression methods where some genetic similarity matrix, e.g., a genomic relationship matrix, can be defined. Our Bayesian approach, based on phenotypes and markers, converts prior (markers only) expected similarity into trait-specific posterior similarity. A simulation illustrates situations under which effective Bayesian learning from phenotypes occurs. Pinus and wheat data sets were used to demonstrate applicability of the concept in practice. The methodology applies to a wide class of Bayesian linear regression models, it extends to the multiple-trait domain, and can also be used to develop phenotype-guided similarity kernels in prediction problems.
Read full abstract