Abstract

Genome-wide association studies (GWAS) have been designed to assess associations of millions of single nucleotide polymorphisms (SNPs) to find genetic variations associated with a particular disease. The huge amount of data produced by GWAS implies a great challenge for data analysis, particularly for combinatorial methods such as Multifactor Dimensionality Reduction (MDR). The MDR approach aims to reduce the dimensionality of multi-locus genotype space to facilitate the identification of genegene interactions. This method can be computationally intensive, especially when more than hundreds polymorphisms need to be evaluated. Such a kind of problems cannot be solved in a reasonable amount of time with conventional computers. Grid computing can address computational problems, like GWAS, enabling software applications to take advantage of widespread computational resources that are managed by diverse organizations in a secure and reliable way. We are, therefore, motivated to extend the MDR software to support distributed execution on available grid resources. In this paper we propose a framework for supporting parallel execution of the MDR method on Grid environments. This framework helps biologists to perform large-scale epistatic interaction in GWAS. Given a large number of SNPs, our framework distributes SNPs combinations over computing nodes to scale-up to large datasets. The experiment results demonstrate that our framework is capable of analyzing 50,000 SNPs datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call