Abstract

Recently genome-wide association studies (GWAS) have identified numerous susceptibility variants for complex diseases. In this study we proposed several approaches to estimate the total number of variants underlying these diseases. We assume that the variance explained by genetic markers (Vg) follow an exponential distribution, which is justified by previous studies on theories of adaptation. Our aim is to fit the observed distribution of Vg from GWAS to its theoretical distribution. The number of variants is obtained by the heritability divided by the estimated mean of the exponential distribution. In practice, due to limited sample sizes, there is insufficient power to detect variants with small effects. Therefore the power was taken into account in fitting. Besides considering the most significant variants, we also tried to relax the significance threshold, allowing more markers to be fitted. The effects of false positive variants were removed by considering the local false discovery rates. In addition, we developed an alternative approach by directly fitting the z-statistics from GWAS to its theoretical distribution. In all cases, the “winner's curse” effect was corrected analytically. Confidence intervals were also derived. Simulations were performed to compare and verify the performance of different estimators (which incorporates various means of winner's curse correction) and the coverage of the proposed analytic confidence intervals. Our methodology only requires summary statistics and is able to handle both binary and continuous traits. Finally we applied the methods to a few real disease examples (lipid traits, type 2 diabetes and Crohn's disease) and estimated that hundreds to nearly a thousand variants underlie these traits.

Highlights

  • The number of genome-wide association studies (GWAS) has grown rapidly in the past few years [1]

  • As overestimation of effect size will lead to overestimation of mean Vg, or underestimation of l, we investigated various statistical methods based on conditional likelihood to correct for the winner’s curse

  • In this study we proposed a variety of methods to estimate the number of susceptibility variants in the genome based on the assumption that effect sizes are exponentially distributed

Read more

Summary

Introduction

The number of genome-wide association studies (GWAS) has grown rapidly in the past few years [1]. GWAS have identified a number of robust associations for complex diseases like breast cancer, prostate cancer, type 1 and 2 diabetes etc. We developed a methodology to tackle the problem by fitting distributions to the GWAS results. We assumed that the effect sizes of all susceptibility variants in the genome, as measured by the variance explained (Vg), follow an exponential distribution. The variance explained is computed based on the liability threshold model. The model proposes a latent continuous liability, which is assumed to follow a normal distribution with mean 0 and variance 1. The variance in liability explained can be directly interpreted as the locus-specific heritability. The method is described in details in another paper [2]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.