FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

Chuan-Le Xiao,Jing-Jie Jin,Jia-Yong Zhong,Xin-Lei Lian,Qing-Yu He,Zhi-Biao Mai,Gong Zhang,Zhang Zhang

doi:10.1371/journal.pone.0094250

Abstract

Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.

Highlights

Mapping millions of next-generation sequencing (NGS) reads accurately to reference sequences is the basis of all deep sequencing applications that utilize reference genomes or transcriptomes, including variant analysis, gene expression and isoform analysis
Longer seeds decrease the number of exact matches exponentially and largely accelerate the mapping: 14-nt seed decreases the number of exact matches 414–8 = 4096 folds than 8-nt seeds
Novoalign was unable to finish the task in 4 days (Figure 3B). These results showed that FANSe2, as a seedbased algorithm, is approaching the speed of Burrows-Wheeler Trasnformation (BWT)-based algorithms while maintaining similar or higher sensitivity when handling huge datasets

Summary

Introduction

Mapping (aligning) millions of next-generation sequencing (NGS) reads accurately to reference sequences is the basis of all deep sequencing applications that utilize reference genomes or transcriptomes, including variant analysis, gene expression and isoform analysis. Accurately mapping to large genomes is still time-consuming [5,6] Another type of algorithms based on Burrows-Wheeler Trasnformation (BWT), e.g. Bowtie and BWA, takes the advantage of the suffix/ prefix trie and reduces the computational complexity, being typically 5,20x faster than seed-based algorithms (reviewed in [2,7]). Such methods can map tens of millions of reads to human genome within one day on desktop workstations, promoting the blowout of NGS applications. In real-world benchmarks, the sensitivity of earlier BWT-based algorithms like Bowtie and SOAP2 (,80%) is still to be improved when mapping DNA sequencing reads, the sensitivity of the upgraded Bowtie is almost the same as the traditional seed-based algorithms while being more than 20x faster [6]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Apr 17, 2014
Citations: 80	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

The Diagnosis Performance of Ultrasonic Transient Elastography for Noninvasive Assessment of Liver Fibrosis in 1138 Chronic Hepatitis C Patients
M Lupsor ... D Feier
Ultrasound in Medicine & Biology | VOL. 37
M Lupsor, et. al.M Lupsor ... D Feier
26 Jul 2011
Ultrasound in Medicine & Biology | VOL. 37

Urinary circulating DNA and circulating antigen for diagnosis of schistosomiasis mansoni: a field study.
Radwa Galal Diab ... Mona Mohamed Tolba
Tropical Medicine & International Health | VOL. 24
Radwa Galal Diab, et. al.Radwa Galal Diab ... Mona Mohamed Tolba
08 Jan 2019
Tropical Medicine & International Health | VOL. 24

Accuracy of preoperative cross-sectional imaging in cervical cancer patients undergoing primary radical surgery
S Allison Staley ... Leslie H Clark
Gynecologic Oncology | VOL. 160
S Allison Staley, et. al.S Allison Staley ... Leslie H Clark
16 Nov 2020
Gynecologic Oncology | VOL. 160

Factors related to infection after fixation in the process of late healed bone fracture.
Xiaoming Zhang ... Huixia An
Experimental and therapeutic medicine | VOL. 14
Xiaoming Zhang, et. al.Xiaoming Zhang ... Huixia An
15 Jun 2017
Experimental and therapeutic medicine | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE