Optimized permutation testing for information theoretic measures of multi-gene interactions

James M Kunert-Graf,Nikita A Sakhanenko,David J Galas

doi:10.1186/s12859-021-04107-6

James M Kunert-Graf, Nikita A Sakhanenko + Show 1 more

Open Access

PDF Available

https://doi.org/10.1186/s12859-021-04107-6

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundPermutation testing is often considered the “gold standard” for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large.ResultsIn this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP–SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples.ConclusionsThe proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts.

Highlights

Permutation testing is often considered the “gold standard” for multitest significance analysis, as it is an exact test requiring few assumptions about the distribution being computed
Genome-wide association studies (GWAS) have shed light on the genetics of complex traits and diseases, but single-locus analyses fail to detect the epistatic gene–gene interactions, which play a crucial role in the genetics of complex traits [1,2,3]
We focus here on the class of techniques based on information theory, which formulate entropy-based measures sensitive to multi-gene epistatic interactions

Summary

Introduction

Permutation testing is often considered the “gold standard” for multitest significance analysis, as it is an exact test requiring few assumptions about the distribution being computed It can be computationally very expensive, in its naive form in which the full analysis pipeline is re-run after permut‐ ing the phenotype labels. We focus here on the class of techniques based on information theory, which formulate entropy-based measures sensitive to multi-gene epistatic interactions These approaches are powerful due to being inherently model-free and sensitive to nonlinear relationships [3]. Permutation testing is often considered the “gold standard” for multi-test significance analysis [32, 33], and is the approach utilized by the majority of the above studies [20,21,22,23,24,25,26,27, 29, 34, 35]

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 7, 2021
Citations: 4	License type: open-access

R Discovery Prime

Optimized permutation testing for information theoretic measures of multi-gene interactions

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

So Many Correlated Tests, So Little Time! Rapid Adjustment of P Values for Multiple Correlated Tests
Karen N Conneely ... Michael Boehnke
The American Journal of Human Genetics | VOL. 81
Karen N Conneely, et. al.Karen N Conneely ... Michael Boehnke
01 Dec 2007
The American Journal of Human Genetics | VOL. 81

Multi-locus genome-wide association study and genomic prediction for flowering time in chrysanthemum.
Jiangshuo Su ... Zhaowen Lu
Planta | VOL. 259
Jiangshuo Su, et. al.Jiangshuo Su ... Zhaowen Lu
08 Dec 2023
Planta | VOL. 259

Uncovering genomic regions controlling plant architectural traits in hexaploid wheat using different GWAS models
Ali Muhammad ... Shahid Ullah Khan
Scientific Reports | VOL. 11
Ali Muhammad, et. al.Ali Muhammad ... Shahid Ullah Khan
24 Mar 2021
Scientific Reports | VOL. 11

The choice of null distributions for detecting gene-gene interactions in genome-wide association studies.
Can Yang ... Weichuan Yu
BMC bioinformatics | VOL. Suppl 12 1
Can Yang, et. al.Can Yang ... Weichuan Yu
15 Feb 2011
BMC bioinformatics | VOL. Suppl 12 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Optimized permutation testing for information theoretic measures of multi-gene interactions

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics