SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner.

Ruibang Luo,Tak-Wah Lam,Haoxiang Lin,David W Cheung,Hing-Fung Ting,Shaoliang Peng,Thomas Wong,Chi-Man Liu,Wenjuan Zhu,Siu-Ming Yiu,Yingrui Li,Ruiqiang Li,Lap-Kei Lee,Chang Yu,Jianqiao Zhu,Xiaoqian Zhu,Edward Wu,Frederick C C Leung

doi:10.1371/journal.pone.0065632

Abstract

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.

Highlights

With the rapid advancement of Next-Generation Sequencing technologies, modern sequencers like Illumina HiSeq 2500 can sequence a human genome into 600 million pairs of reads of 100 bp in length in merely 27 hours
A simple approach to extend mismatch alignment to gapped alignment is to first identify candidate regions by exact or mismatch alignment of short substrings in the reads, use dynamic programming to perform a detailed alignment of the read to the regions
SOAP3-dp has been successfully deployed on Amazon EC2, NIH BioWulf and Tianhe-1A computing-cloud

Summary

Introduction

With the rapid advancement of Next-Generation Sequencing technologies, modern sequencers like Illumina HiSeq 2500 can sequence a human genome into 600 million pairs of reads of 100 bp in length (total 120 Gigabases) in merely 27 hours. By 2013 year’s end, sequencing a human genome is projected to cost less than $1,000. Bioinformatics research using sequencing data often starts with aligning the data onto a reference genome, followed by various downstream analyses. Alignment is computationally intensive; the 1000 genomes pilot paper [1] published in 2010 reported that a 1192-processor cluster was used to align the reads using MAQ [2]. This kind of computing resources is not available to most laboratories, let alone clinical settings. Ultra-fast alignment tools without relying on extensive computing resources are needed

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: May 31, 2013
Citations: 144	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

What proportion of declared QTL in plants are false?
R Bernardo
Theoretical and Applied Genetics | VOL. 109
R BernardoR Bernardo
14 Apr 2004
Theoretical and Applied Genetics | VOL. 109

Abstract 5099: Evaluation of two next-generation sequencing platforms for genomic analysis in circulating tumor cells
Chi Shan Candy Lam ... Wei Dai
Cancer Research | VOL. 79
Chi Shan Candy Lam, et. al.Chi Shan Candy Lam ... Wei Dai
01 Jul 2019
Abstract 5099: Evaluation of two next-generation sequencing platforms for genomic analysis in circulating tumor cells
Chi Shan Candy Lam ... Wei Dai

Detection of FLT3 Internal Tandem Duplication in Targeted, Short-Read-Length, Next-Generation Sequencing Data
David H Spencer ... Eric J Duncavage
The Journal of Molecular Diagnostics | VOL. 15
David H Spencer, et. al.David H Spencer ... Eric J Duncavage
14 Nov 2012
The Journal of Molecular Diagnostics | VOL. 15

Abstract 5099: Evaluation of two next-generation sequencing platforms for genomic analysis in circulating tumor cells
Chi Shan Candy Lam ... Josephine Mun Yee Ko
-
Chi Shan Candy Lam, et. al.Chi Shan Candy Lam ... Josephine Mun Yee Ko
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE