Abstract
In large DNA sequence repositories, archival data storage is often coupled with computers that provide 40 or more CPU threads and multiple GPU (general-purpose graphics processing unit) devices. This presents an opportunity for DNA sequence alignment software to exploit high-concurrency hardware to generate short-read alignments at high speed. Arioc, a GPU-accelerated short-read aligner, can compute WGS (whole-genome sequencing) alignments ten times faster than comparable CPU-only alignment software. When two or more GPUs are available, Arioc's speed increases proportionately because the software executes concurrently on each available GPU device. We have adapted Arioc to recent multi-GPU hardware architectures that support high-bandwidth peer-to-peer memory accesses among multiple GPUs. By modifying Arioc's implementation to exploit this GPU memory architecture we obtained a further 1.8x-2.9x increase in overall alignment speeds. With this additional acceleration, Arioc computes two million short-read alignments per second in a four-GPU system; it can align the reads from a human WGS sequencer run-over 500 million 150nt paired-end reads-in less than 15 minutes. As WGS data accumulates exponentially and high-concurrency computational resources become widespread, Arioc addresses a growing need for timely computation in the short-read data analysis toolchain.
Highlights
Short-read DNA sequencing technology has been estimated to generate 35 petabases of DNA sequencer data per year [1], with the amount of new data increasing exponentially [2]
Highthroughput computational resources are becoming an integral part of the local computing environment in data centers that archive sequencing data
We carried out speed-versus-sensitivity experiments using all four sets of sequencer reads and with all three lookup tables (LUTs) layouts in GPU memory
Summary
Short-read DNA sequencing technology has been estimated to generate 35 petabases of DNA sequencer data per year [1], with the amount of new data increasing exponentially [2]. When GPU memory is large enough to contain these tables and P2P memory interconnect is supported, LUT data accesses execute 10 or more times faster and the overall speed of the software increases .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.