Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel

Nikolaos Alachiotis,Simon A Berger,Alexandros Stamatakis

doi:10.1186/1471-2105-13-196

Nikolaos Alachiotis, Simon A Berger + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-13-196

Copy DOI

Abstract

BackgroundAligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The algorithm aligns short reads to phylogenetic profiles that correspond to the branches of such a reference tree. The algorithm needs to perform an immense number of pairwise alignments. Therefore, we explore vector intrinsics and GPUs to accelerate the PaPaRa alignment kernel.ResultsWe optimized and parallelized PaPaRa on CPUs and GPUs. Via SSE 4.1 SIMD (Single Instruction, Multiple Data) intrinsics for x86 SIMD architectures and multi-threading, we obtained a 9-fold acceleration on a single core as well as linear speedups with respect to the number of cores. The peak CPU performance amounts to 18.1 GCUPS (Giga Cell Updates per Second) using all four physical cores on an Intel i7 2600 CPU running at 3.4 GHz. The average CPU performance (averaged over all test runs) is 12.33 GCUPS. We also used OpenCL to execute PaPaRa on a GPU SIMT (Single Instruction, Multiple Threads) architecture. A NVIDIA GeForce 560 GPU delivered peak and average performance of 22.1 and 18.4 GCUPS respectively. Finally, we combined the SIMD and SIMT implementations into a hybrid CPU-GPU system that achieved an accumulated peak performance of 33.8 GCUPS.ConclusionsThis accelerated version of PaPaRa (available at http://www.exelixis-lab.org/software.html) provides a significant performance improvement that allows for analyzing larger datasets in less time. We observe that state-of-the-art SIMD and SIMT architectures deliver comparable performance for this dynamic programming kernel when the “competing programmer approach” is deployed. Finally, we show that overall performance can be substantially increased by designing a hybrid CPU-GPU system with appropriate load distribution mechanisms.

Highlights

Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context
GPU Performance To assess performance of the Open Computing Language (OpenCL) Single Instruction Multiple Threads (SIMT) implementation, we used a heterogeneous system equipped with an Intel i7 2600 CPU running at 3.4 GHz (SIMD platform) and a NVIDIA GeForce 560 GPU with 336 Compute Unified Device Architecture (CUDA) cores and 1 GB DDR5 device memory (SIMT platform)
We observed that state-of-theart CPUs and GPUs deliver comparable performance for sequence alignment algorithms if properly optimized

Summary

Introduction

Aligning short DNA reads to a reference sequence alignment is a prerequisite for detecting their biological origin and analyzing them in a phylogenetic context. With the PaPaRa tool we introduced a dedicated dynamic programming algorithm for simultaneously aligning short reads to reference alignments and corresponding evolutionary reference trees. The PaPaRa tool [1] implements a new method for aligning a—typically—large number of short sequence reads against a reference multiple sequence alignment (MSA) and a corresponding phylogenetic tree. HMMALIGN, MUSCLE, and MAFFT align short sequence reads against a single, monolithic. Dynamic programming alignment algorithms generally exhibit a time complexity of O(mn) for aligning two sequences of length m and n against each other. This can become a limiting factor when either two long sequences or a large number of sequences are aligned. Because of the analogies between the SWA and PaPaRa kernels, we briefly survey SWA optimization efforts

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Aug 9, 2012
Citations: 29	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs
Christopher P Stone ... Kyle E Niemeyer
Computer Physics Communications | VOL. 226
Christopher P Stone, et. al.Christopher P Stone ... Kyle E Niemeyer
07 Feb 2018
Computer Physics Communications | VOL. 226

Performance Analysis of Existing SIMD Architectures
Chao Cui ... Zhicheng Jin
-
Chao Cui, et. al.Chao Cui ... Zhicheng Jin
01 Jan 2019
01 Jan 2019

Accelearation of Full-Search Algorithm on SIMD Architectures by Using Eight-Bit Partial Sums of Four Luminance Values
C J Duanmu
-
C J DuanmuC J Duanmu
01 Dec 2006
01 Dec 2006

Divergent Branch Threads Compaction for Efficient SIMD Control Flow
Hui Yang ... Jianghua Wan
Chinese Journal of Electronics | VOL. 24
Hui Yang, et. al.Hui Yang ... Jianghua Wan
01 Apr 2015
Chinese Journal of Electronics | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics