Abstract

While variance components analysis has emerged as a powerful tool in complex trait genetics, existing methods for fitting variance components do not scale well to large-scale datasets of genetic variation. Here, we present a method for variance components analysis that is accurate and efficient: capable of estimating one hundred variance components on a million individuals genotyped at a million SNPs in a few hours. We illustrate the utility of our method in estimating and partitioning variation in a trait explained by genotyped SNPs (SNP-heritability). Analyzing 22 traits with genotypes from 300,000 individuals across about 8 million common and low frequency SNPs, we observe that per-allele squared effect size increases with decreasing minor allele frequency (MAF) and linkage disequilibrium (LD) consistent with the action of negative selection. Partitioning heritability across 28 functional annotations, we observe enrichment of heritability in FANTOM5 enhancers in asthma, eczema, thyroid and autoimmune disorders.

Highlights

  • We found that estimates from the summary-statistic methods tend to be sensitive to the underlying genetic architecture: across 16 architecture relative biases range from −31% to 27% for LD score regression (LDSC), −27% to 5% for Stratified LD score regression (S-LDSC), and −5% to 9% for SumHer (Fig. 1)

  • In light of increasing evidence for SNP effect sizes that vary as a function of covariates, such as minor allele frequency (MAF) and linkage disequilibrium (LD) and the bias associated with methods that fit only a single variance component[8], the ability to define flexible models endowed with multiple variance components is important to obtain unbiased estimates of Systolic blood pressure

  • We confirm that RHE-mc yields accurate genome-wide SNP heritability estimates under diverse genetic architectures

Read more

Summary

Results

We ran each of these methods by partitioning SNPs into 24 variance components (6 MAF bins by 4 LD bins, see “Methods” section) To make these experiments computationally feasible, we simulated phenotypes starting from a smaller set of genotypes (M = 593,300 array SNPs and N = 10,000 white British individuals). For the small-scale simulations, we compared RHE-mc to GCTA-mc We ran both methods by partitioning the SNPs into 24 variance components based on six MAF bins as well as four LD bins defined by quartiles of the measure of LDAK weight at a SNP (see “Methods” section). We caution that negative heritability estimates in bins of lowest MAF and high LD score could arise due to one or more of the following factors: low number of SNPs in this bin (we did not constrain our variance components estimates to be non-negative), the inadequacy of the assumed heritability model, and errors in the imputed genotypes used for the analysis.

Discussion
Methods
11 B MkMl z
Code availability
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.