Abstract

The most important decision faced by large-scale studies, such as those presently encountered in human genetics, is to distinguish between those tests that are true positives from those that are not. In the context of genetics, this entails the determination of genetic markers that actually underlie medically-relevant phenotypes from a vast number of makers typically interrogated in genome-wide studies. A critical part of these decisions relies on the appropriate statistical assessment of data obtained from tests across numerous markers. Several methods have been developed to aid with such analyses, with family-wise approaches, such as the Bonferroni and Dunn-Šidàk corrections, being popular. Conditions that motivate the use of family-wise corrections are explored. Although simple to implement, one major limitation of these approaches is that they assume that p-values are i.i.d. uniformly distributed under the null hypothesis. However, several factors may violate this assumption in genome-wide studies including effects from confounding by population stratification, the presence of related individuals, the correlational structure among genetic markers, and the use of limiting distributions for test statistics. Even after adjustment for such effects, the distribution of p-values can substantially depart from a uniform distribution under the null hypothesis. In this work, I present a decision theory for the use of family-wise corrections for multiplicity and a generalization of the Dunn-Šidàk correction that relaxes the assumption of uniformly-distributed null p-values. The independence assumption is also relaxed and handled through calculating the effective number of independent tests. I also explicitly show the relationship between order statistics and family-wise correction procedures. This generalization may be applicable to multiplicity problems outside of genomics.

Highlights

  • A recurring question in large-scale studies concerns the proper statistical treatment of findings when numerous tests are performed

  • Suppose that a case/control genetic association study is conducted to examine the correlation between genotypes and the presence of a disease

  • Genotypes are typically measured in cases and controls for a large number of single nucleotide polymorphisms (SNPs)

Read more

Summary

Introduction

A recurring question in large-scale studies concerns the proper statistical treatment of findings when numerous tests are performed. Using nominal significance levels as a threshold for reporting findings is prone to false positives This is problematic in genetic epidemiology studies focused on disease association within sets of candidate genes. It is well-known that the rate of replicating results from such studies was exceedingly low, suggesting ubiquitous type I errors [1,2]. Several shortcomings likely conspired to produce high type I error rates for these studies including publication bias, population stratification, reporting of the most significant result following the use of a number of analysis methods and tests, and the previous frequent use of a nominal α = 0.05. The era of genome-wide association studies has greatly benefited from the widespread adoption of experiment-wise significance levels which has dramatically improved the replication of positive findings [3,4]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.