ParaHaplo 2.0: a program package for haplotype-estimation and haplotype-based whole-genome association study using parallel computing

Kazuharu Misawa,Naoyuki Kamatani

doi:10.1186/1751-0473-5-5

Abstract

BackgroundThe use of haplotype-based association tests can improve the power of genome-wide association studies. Since the observed genotypes are unordered pairs of alleles, haplotype phase must be inferred. However, estimating haplotype phase is time consuming. When millions of single-nucleotide polymorphisms (SNPs) are analyzed in genome-wide association study, faster methods for haplotype estimation are required.MethodsWe developed a program package for parallel computation of haplotype estimation. Our program package, ParaHaplo 2.0, is intended for use in workstation clusters using the Intel Message Passing Interface (MPI). We compared the performance of our algorithm to that of the regular permutation test on both Japanese in Tokyo, Japan and Han Chinese in Beijing, China of the HapMap dataset.ResultsParallel version of ParaHaplo 2.0 can estimate haplotypes 100 times faster than a non-parallel version of the ParaHaplo.ConclusionParaHaplo 2.0 is an invaluable tool for conducting haplotype-based genome-wide association studies (GWAS). The need for fast haplotype estimation using parallel computing will become increasingly important as the data sizes of such projects continue to increase. The executable binaries and program sources of ParaHaplo are available at the following address: http://en.sourceforge.jp/projects/parallelgwas/releases/

Highlights

The use of haplotype-based association tests can improve the power of genome-wide association studies
To cope with problems related to multiple comparisons in Genome-wide association studies (GWAS), haplotype-based algorithms were developed to correct for multiple comparisons at multiple single-nucleotide polymorphisms (SNPs) loci in linkage disequilibrium [5]
Parallel Computation of Haplotype-based GWAS The results showed that the parallel computing ability of ParaHaplo 2.0 for haplotype estimation was 100 times faster than non-parallel version of ParaHaplo 2.0

Summary

Introduction

The use of haplotype-based association tests can improve the power of genome-wide association studies. When millions of single-nucleotide polymorphisms (SNPs) are analyzed in genome-wide association study, faster methods for haplotype estimation are required. Recent advances in various high-throughput genotyping technologies have allowed us to test allele frequency differences between case and control populations on a genome-wide scale [1]. Genome-wide association studies (GWAS) are used to compare the frequency of alleles or genotypes of a particular variant between cases and controls for a particular disease across a given genome [2,3,4]. To cope with problems related to multiple comparisons in GWAS, haplotype-based algorithms were developed to correct for multiple comparisons at multiple SNP loci in linkage disequilibrium [5]. To conduct haplotype-GWAS within a short time period, Misawa and Kamatani [9] developed ParaHaplo 1.0, a set of computer programs for the parallel computation of accurate P values in haplotype-based GWAS by using the MCMC [5] and RAT [6].algorithms

Methods

Results

Discussion

Conclusion