Abstract

BackgroundIdentifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions.ResultsWe systematically tested our approach on a simulation study with datasets possessing various genetic constraints including heritability, number of SNPs, sample size, etc. Our methodology showed high success rates for detecting the interaction SNP pair. We also applied our approach to two bladder cancer datasets, which showed consistent results with well-studied methodologies, such as multifactor dimensionality reduction (MDR) and statistical epistasis network (SEN). Furthermore, we built permuted random forest networks (PRFN), in which we used nodes to represent SNPs and edges to indicate interactions.ConclusionsWe successfully developed a scale-invariant methodology to detect pure gene-gene interactions based on permutation strategies and the machine learning method random forest. This methodology showed great potential to be used for detecting gene-gene interactions to study underlying genetic architectures in a scale-free way, which could be benefit to uncover the complex disease mechanisms.

Highlights

  • Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases

  • Different strategies have been designed to solve such problems, which include applying filter algorithms to reduce the number of single nucleotide polymorphism (SNP) in the analysis by removing redundant SNPs based on the needs, such as Spatially Uniform ReliefF (SURF), and doing pathway analysis to subset the SNP dataset based on similar biological functions [12, 13]

  • Permuted random forest We proposed a method called permuted random forest to address these two questions: First, given a SNP dataset, how can we detect the SNP-SNP interactions accurately? Second, how can we analyze all SNPs together into a model to incorporate multi-SNP interactions instead of only analyzing the interactions using the data from the pair of SNPs? In our approach, we quantified the interaction signal by estimating how much the signal contributes to the model prediction power

Read more

Summary

Introduction

Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Genome-wide association studies (GWASs) have revolutionized the strategy for identification effects of single nucleotide polymorphisms (SNPs) on disease susceptibility and detecting genetic architectures underlying complex diseases from large-scale genotyping data, such as type II diabetes, obesity and cancer [1,2,3,4,5]. GWASs have uncovered a great number of disease susceptibility loci, yet we still have very limited knowledge of the genetic architecture of some diseases and cannot accurately predict the disease risk from genetic information [6] This is challenging due to the consequences of genetic heterogeneity, epistasis (gene-gene interactions) and gene-environment interactions. Traditional methods that have been used to analyze the genetic-disease associations include linear regression, logistic regression, chi-square test, etc These approaches map single loci one at a time to detect main effects, but ignore interactions between. Different strategies have been designed to solve such problems, which include applying filter algorithms to reduce the number of SNPs in the analysis by removing redundant SNPs based on the needs, such as Spatially Uniform ReliefF (SURF), and doing pathway analysis to subset the SNP dataset based on similar biological functions [12, 13]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call