Abstract

Gene-based tests of association are frequently applied to common SNPs (MAF>5%) as an alternative to single-marker tests. In this analysis we conduct a variety of simulation studies applied to five popular gene-based tests investigating general trends related to their performance in realistic situations. In particular, we focus on the impact of non-causal SNPs and a variety of LD structures on the behavior of these tests. Ultimately, we find that non-causal SNPs can significantly impact the power of all gene-based tests. On average, we find that the “noise” from 6–12 non-causal SNPs will cancel out the “signal” of one causal SNP across five popular gene-based tests. Furthermore, we find complex and differing behavior of the methods in the presence of LD within and between non-causal and causal SNPs. Ultimately, better approaches for a priori prioritization of potentially causal SNPs (e.g., predicting functionality of non-synonymous SNPs), application of these methods to sequenced or fully imputed datasets, and limited use of window-based methods for assigning inter-genic SNPs to genes will improve power. However, significant power loss from non-causal SNPs may remain unless alternative statistical approaches robust to the inclusion of non-causal SNPs are developed.

Highlights

  • In the analysis of SNP microarray data, SNPs are aggregated into sets representing genes, pathways, or other biologically meaningful sets

  • The MAF of non-causal SNPs was not significantly related to power for any test except Logistic regression using Principal Components (LR-PC), where power decreased as the MAF of non-causal SNPs increased

  • Gene-based tests are being applied with increasing frequency to common SNPs (MAF.5%) directly measured by SNP microarrays or imputed in GWAS as an alternative to single-marker tests

Read more

Summary

Introduction

In the analysis of SNP microarray data, SNPs are aggregated into sets representing genes, pathways, or other biologically meaningful sets. Set-based tests are conducted in addition to testing for genotype-phenotype association using single marker approaches. The set-based approach is part of a general trend in statistical genetics to leverage a priori biological knowledge in the analysis of genetic data, instead of conducting analyses in an agnostic (no prior biological knowledge considered) fashion. A substantial multiple-testing penalty (e.g., p,161028) is applied to each of the singlemarker association test p-values, before deeming a SNP as showing significant evidence of a genotype-phenotype association. With such a small type I error cutoff for statistical significance, designing an adequately powered study can be challenging. While designing studies with tens to hundreds of thousands of subjects is possible in some situations, for many diseases it is difficult to obtain a sufficient number of cases

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.