Abstract

Identifying single nucleotide polymorphism (SNP) interactions is considered as a popular and crucial way for explaining the missing heritability of complex diseases in genome-wide association studies (GWAS). Many approaches have been proposed to detect SNP interactions. However, existing approaches generally suffer from the high computational complexity resulting from the explosion of candidate high-order interactions. In this paper, we propose a two-stage approach (called ClusterMI) to detect high-order genome-wide SNP interactions based on significant pairwise SNP combinations. In the screening stage, to alleviate the huge computational burden, ClusterMI firstly applies a clustering algorithm combined with mutual information to divide SNPs into different clusters. Then, ClusterMI utilizes conditional mutual information to screen significant pairwise SNP combinations in each cluster. In this way, there is a higher probability of identifying significant two-locus combinations in each group, and the computational load for the follow-up search can be greatly reduced. In the search stage, two different search strategies (exhaustive search and improved ant colony optimization search) are provided to detect high-order SNP interactions based on the cardinality of significant two-locus combinations. Extensive simulation experiments show that ClusterMI has better performance than other related and competitive approaches. Experiments on two real case-control datasets from Wellcome Trust Case Control Consortium (WTCCC) also demonstrate that ClusterMI is more capable of identifying high-order SNP interactions from genome-wide data.

Highlights

  • Genome-wide association study (GWAS) has become a popular and powerful tool for studying human complex diseases [1]

  • We propose a two-stage approach named ClusterMI (Clustering combined with Mutual Information) to detect high-order single nucleotide polymorphism (SNP) interactions based on two-locus combinations

  • We adopt the same measurement of power suggested by Wan et al [10]: Power where DT is the number of datasets in which true SNP interactions can be successfully identified and D is the number of all datasets

Read more

Summary

Introduction

Genome-wide association study (GWAS) has become a popular and powerful tool for studying human complex diseases [1]. Most of them only evaluate the statistical significance of a single SNP based on the selected case and control samples. The SNP is considered to be associated with complex disease, if and only if its frequency in the cases is significantly higher or lower than that in the controls. Single SNPs cannot completely explain the pathogenesis of human complex diseases [3,4,5]. SNP interactions among multiple genes play an essential role in the pathogenesis of complex diseases [6]. Precisely detecting SNP interactions contributes to better understanding of the genetic mechanisms of complex diseases

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call