Abstract

This study aims at characterizing the asymptotic behavior of genomic prediction R2 as the size of the reference population increases for common or rare QTL alleles through simulations. Haplotypes derived from whole-genome sequence of 85 Caucasian individuals from the 1,000 Genomes Project were used to simulate random mating in a population of 10,000 individuals for at least 100 generations to create the LD structure in humans for a large number of individuals. To reduce computational demands, only SNPs within a 0.1M region of each of the first 5 chromosomes were used in simulations, and therefore, the total genome length simulated was 0.5M. When the genome length is 30M, to get the same genomic prediction R2 as with a 0.5M genome would require a reference population 60 fold larger. Three scenarios were considered varying in minor allele frequency distributions of markers and QTL, for h2 = 0.8 resembling height in humans. Total number of markers was 4,200 and QTL were 70 for each scenario. In this study, we considered the prediction accuracy in terms of an estimability problem, and thereby provided an upper bound for reliability of prediction, and thus, for prediction R2. Genomic prediction methods GBLUP, BayesB and BayesC were compared. Our results imply that for human height variable selection methods BayesB and BayesC applied to a 30M genome have no advantage over GBLUP when the size of reference population was small (<6,000 individuals), but are superior as more individuals are included in the reference population. All methods become asymptotically equivalent in terms of prediction R2, which approaches genomic heritability when the size of the reference population reaches 480,000 individuals.

Highlights

  • The availability of single nucleotide polymorphism (SNP) marker chips for many species has given rise to the era of genomic prediction (GP)

  • The accuracy of GP is influenced by many factors, such as the method used to estimate marker effects [9, 10], the heritability (h2) and genetic architecture of the trait [10, 11], and the size and structure of the RP [11,12,13,14,15,16]

  • We approach GP as an estimability problem and provide an upper bound for reliability of prediction, and an upper bound for prediction R2

Read more

Summary

Introduction

The availability of single nucleotide polymorphism (SNP) marker chips for many species has given rise to the era of genomic prediction (GP). SNP genotypes of a group of individuals (hereafter, called the reference population-RP) to estimate marker effects, which are used to predict breeding values, or yet-to-be observed phenotypes of individuals with genotypes (hereafter called the validation population-VP) [1]. The accuracy of GP is influenced by many factors, such as the method used to estimate marker effects [9, 10], the heritability (h2) and genetic architecture of the trait [10, 11], and the size (nR) and structure of the RP [11,12,13,14,15,16]. The method, and the size and structure of the RP can be chosen or designed utilizing available knowledge about the heritability and genetic architecture of the trait

Objectives
Methods
Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.