Abstract

BackgroundWhen many (up to millions) of statistical tests are conducted in discovery set analyses such as genome-wide association studies (GWAS), approaches controlling family-wise error rate (FWER) or false discovery rate (FDR) are required to reduce the number of false positive decisions. Some methods were specifically developed in the context of high-dimensional settings and partially rely on the estimation of the proportion of true null hypotheses. However, these approaches are also applied in low-dimensional settings such as replication set analyses that might be restricted to a small number of specific hypotheses. The aim of this study was to compare different approaches in low-dimensional settings using (a) real data from the CKDGen Consortium and (b) a simulation study.ResultsIn both application and simulation FWER approaches were less powerful compared to FDR control methods, whether a larger number of hypotheses were tested or not. Most powerful was the q-value method. However, the specificity of this method to maintain true null hypotheses was especially decreased when the number of tested hypotheses was small. In this low-dimensional situation, estimation of the proportion of true null hypotheses was biased.ConclusionsThe results highlight the importance of a sizeable data set for a reliable estimation of the proportion of true null hypotheses. Consequently, methods relying on this estimation should only be applied in high-dimensional settings. Furthermore, if the focus lies on testing of a small number of hypotheses such as in replication settings, FWER methods rather than FDR methods should be preferred to maintain high specificity.

Highlights

  • When many of statistical tests are conducted in discovery set analyses such as genome-wide association studies (GWAS), approaches controlling family-wise error rate (FWER) or false discovery rate (FDR) are required to reduce the number of false positive decisions

  • Data example For the purpose of illustration, the 50 GWAS summary statistics provided by contributing study groups included in the original CKDGen discovery meta-analysis of eGFRcrea were split into 2 sets resembling a highdimensional discovery set (35 studies, 90,565 individuals) and a low-dimensional replication set (15 studies, 42,848 individuals)

  • Based on p-value threshold < 10−6 followed by Linkage disequilibrium (LD) pruning, 57 index Single nucleotide polymorphism (SNP) from different genomic regions were selected from the discovery set

Read more

Summary

Introduction

When many (up to millions) of statistical tests are conducted in discovery set analyses such as genome-wide association studies (GWAS), approaches controlling family-wise error rate (FWER) or false discovery rate (FDR) are required to reduce the number of false positive decisions. Some methods were developed in the context of high-dimensional settings and partially rely on the estimation of the proportion of true null hypotheses. These approaches are applied in low-dimensional settings such as replication set analyses that might be restricted to a small number of specific hypotheses. When testing multiple hypotheses such as in GWAS, the application of a threshold like 0.05 across all tests will result in an unacceptable large number of false positive results. Other ways to control the type I error are required

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call