Gene-gene interaction filtering with ensemble of filters

Pengyi Yang,Joshua Wk Ho,Yee Hwa Yang,Bing B Zhou

doi:10.1186/1471-2105-12-s1-s10

Pengyi Yang, Joshua Wk Ho + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-12-s1-s10

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Feb 15, 2011
Citations: 79	License type: CC BY 2.0

Affiliation: University of Sydney, Data61

Abstract

BackgroundComplex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering procedure is critical to reduce the search space prior to the computationally intensive gene-gene interaction identification step. In this study, we show that two commonly used SNP-SNP interaction filtering algorithms, ReliefF and tuned ReliefF (TuRF), are sensitive to the order of the samples in the dataset, giving rise to unstable and suboptimal results. However, we observe that the ‘unstable’ results from multiple runs of these algorithms can provide valuable information about the dataset. We therefore hypothesize that aggregating results from multiple runs of the algorithm may improve the filtering performance.ResultsWe propose a simple and effective ensemble approach in which the results from multiple runs of an unstable filter are aggregated based on the general theory of ensemble learning. The ensemble versions of the ReliefF and TuRF algorithms, referred to as ReliefF-E and TuRF-E, are robust to sample order dependency and enable a more informative investigation of data characteristics. Using simulated and real datasets, we demonstrate that both the ensemble of ReliefF and the ensemble of TuRF can generate a much more stable SNP ranking than the original algorithms. Furthermore, the ensemble of TuRF achieved the highest success rate in comparison to many state-of-the-art algorithms as well as traditional χ2-test and odds ratio methods in terms of retaining gene-gene interactions.

Highlights

Complex diseases are commonly caused by multiple genes and their interactions with each other
Under the assumption that common diseases are associated with common variants, Genome-wide association (GWA) studies often aim to identify a set of single nucleotide polymorphisms (SNPs) that are statistically associated with a target disease
The effect of the sample order dependency We found that the SNP ranking generated by ReliefF and tuned ReliefF (TuRF) are sensitive to the order in which the samples are presented in the dataset

Summary

Introduction

Complex diseases are commonly caused by multiple genes and their interactions with each other. Several recent studies indicate that many complex traits cannot be explained by any single SNP variants and the characterization of gene-gene interactions and geneenvironment interactions may be the key to understand the underlying pathogenesis of these complex diseases [3,4,5] For this reason, several methods have been developed to jointly evaluate SNP and environmental factors with the aim to identify gene-gene and gene-environment interactions that have major contributions to complex diseases [6]. Several methods have been developed to jointly evaluate SNP and environmental factors with the aim to identify gene-gene and gene-environment interactions that have major contributions to complex diseases [6] These methods analyze genetic factors in a combinatorial manner when applied to SNP dataset with case and control samples. Popular combinatorial methods include random forests based algorithms [7,8], multifactor dimensionality reduction (MDR) [9,10], Bayesian based algorithms [11], and evolutionary approaches [12,13]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Gene-gene interaction filtering with ensemble of filters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A novel method to identify high order gene-gene interactions in genome-wide association studies: Gene-based MDR
Sohee Oh ... Taesung Park
BMC Bioinformatics | VOL. 13
Sohee Oh, et. al.Sohee Oh ... Taesung Park
01 Jun 2012
BMC Bioinformatics | VOL. 13

Statistical Analysis of High-Dimensional Genetic Data in Complex Traits.
Taesung Park ... Xiang-Yang Lou
BioMed research international | VOL. 2015
Taesung Park, et. al.Taesung Park ... Xiang-Yang Lou
01 Jan 2015
BioMed research international | VOL. 2015

Efficient and Fast Analysis for Detecting High Order Gene-by-Gene Interactions in a Genome-Wide Association Study
Sohee Oh ... Taesung Park
-
Sohee Oh, et. al.Sohee Oh ... Taesung Park
01 Nov 2011
01 Nov 2011

Research progress in machine learning methods for gene-gene interaction detection.
Zhe-Ye Peng ...
Yi chuan = Hereditas | VOL. 40
Zhe-Ye Peng, et. al.Zhe-Ye Peng ...
20 Mar 2018
Yi chuan = Hereditas | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gene-gene interaction filtering with ensemble of filters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics