Abstract

Predicting genetic values is important in animal and plant breeding, personalized medicine and evolutionary biology. Traditionally, prediction is based on a best linear unbiased prediction (BLUP) approach within a linear mixed model framework, with covariance structures obtained from relationship measures between individuals. Nowadays, single nucleotide polymorphism (SNP) data allow to incorporate genomic information into the model (genomic BLUP (GBLUP)). Prediction is also the principal topic in geostatistics in the framework of correlated data. Here, the so-called “kriging” approach performs BLUP using parameterized covariance functions. In this thesis, the kriging concept to perform genomic prediction using the family of Matérn covariance functions is adopted and kriging is compared to GBLUP in a whole-genome simulation study. The results of the simulation study suggest that kriging is superior over GBLUP in non-additive gene-action scenarios. The methodological development of genome-based prediction methods has become even more important with the increasing availability of whole genome sequence data. This thesis provides the world-wide first application of phenotype prediction based on sequence data in a higher eukaryote using the “Drosophila melanogaster Genetic Reference Panel”, which comprises sequences and phenotypic data of 157 inbred lines of the model organism Drosophila melanogaster. For the traits “starvation resistance” and “startle response” moderate predictive abilities are obtained performing GBLUP, utilizing 2.5 million SNPs to infer genomic relationships between individuals. The predictive ability of a Bayesian method with internal SNP selection is not higher than the one obtained with GBLUP, and predictive ability of GBLUP decreases only when fewer than 150,000 SNPs are used. For a third trait (“chill coma recovery”) the GBLUP approach fails completely. Based on differentiated analyses and a corresponding two-marker genome-wide association study, two possible reasons for this failure are identified: the bimodal phenotypic distribution and an extensive network of epistatic interactions between SNPs. The accuracy of genomic prediction is also affected by the underlying structure of linkage disequilibrium (LD) between SNPs. Several formulae for the expected levels of LD in finite populations have been proposed in the literature, most of them being approximate. In this thesis, an alternative recursion formula for the development of LD over time is proposed. A simulation study illustrates that for all parameter constellations under consideration the proposed formula performs better than the widely used formula of Sved. The theory of discrete-time Markov chains further allows the derivation of the expected amount of LD at equilibrium, leading to a formula for the effective population size Ne. By analyzing the effect of non-exactness of the recursion formula on the steady-state, it is demonstrated that the resulting error in expected LD can be substantial. Using the human HapMap data, it is further illustrated that the Ne-estimate strongly depends on the distribution of minor allele frequencies taken as a basis to select SNPs for the analyses. Comprising a wide spectrum of investigations at the interface between statistics, animal breeding and genetics, the findings of this thesis are of interest from a practical as well as from a methodical statistical point of view.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call