Abstract

The primary goal of genome-wide association studies (GWAS) is to discover genes or variants associated with complex diseases. Most GWA studies use single SNP (single nucleotide polymorphism) approaches that mainly focused on assessing the association between each individual SNP and disease; therefore they cannot take into account the combinations of SNPs. However, complex diseases are thought to involve complex etiologies including complicated interactions between many SNPs. Thus, different approaches are necessary to identify SNPs that influence disease risk jointly or in complex interactions. To discover SNP-SNP interactions, in this paper we propose first to use an improvement of Random Forest algorithm tailored for structured GWAS data, all rules are then extracted from the trees to analyse SNPs interactions. Our method allows one to select subgroups of informative SNPs which are most relevant to disease for building accurate decision trees and then we enable educe SNPs interactions from these trees. By this way, it reduces the dimensionality and can perform well with high-dimensional SNPs data sets. We conducted experiments on two genome-wide SNP data sets to demonstrate the effectiveness of the method for the SNP-SNP interactions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call