Abstract

With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu.

Highlights

  • Association studies have emerged as a powerful tool for discovering the genetic basis of human diseases [1,2,3]

  • The simplest way to correct for multiple hypothesis testing is the Bonferroni correction, which multiplies the p-values by the number of markers assuming the markers are independent

  • Since the markers are correlated due to linkage disequilibrium, this approach leads to a conservative estimate of false positives, adversely affecting statistical power

Read more

Summary

Introduction

Association studies have emerged as a powerful tool for discovering the genetic basis of human diseases [1,2,3]. The large number of correlated markers brings to the forefront the multiple hypothesis testing correction problem and has motivated much recent activity to address it [4,5,6]. There are two common versions of the multiple testing correction problem: per-marker threshold estimation and p-value correction. We would like to know how significant a pointwise p-value needs to be in order to obtain a significant result given that we are observing M markers. For each pointwise p-value, the corrected p-value can be defined as the probability that, under the null hypothesis, a p-value equal to or smaller than the pointwise p-value will be observed at any marker [7]. The Bonferroni correction corrects a pointwise p-value p to pM, or estimates the per-marker threshold as a=M given a significance threshold a

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.