Abstract
Aligning billions of reads generated by the next-generation sequencing (NGS) to reference sequences, termed “mapping”, is the time-consuming and computationally-intensive process in most NGS applications. A Fast, accurate and robust mapping algorithm is highly needed. Therefore, we developed the FANSe3 mapping algorithm, which can map a 30 × human whole-genome sequencing (WGS) dataset within 30 min, a 50 × human whole exome sequencing (WES) dataset within 30 s, and a typical mRNA-seq dataset within seconds in a single-server node without the need for any hardware acceleration feature. Like its predecessor FANSe2, the error rate of FANSe3 can be kept as low as 10–9 in most cases, this is more robust than the Burrows–Wheeler transform-based algorithms. Error allowance hardly affected the identification of a driver somatic mutation in clinically relevant WGS data and provided robust gene expression profiles regardless of the parameter settings and sequencer used. The novel algorithm, designed for high-performance cloud-computing after infrastructures, will break the bottleneck of speed and accuracy in NGS data analysis and promote NGS applications in various fields. The FANSe3 algorithm can be downloaded from the website: http://www.chi-biotech.com/fanse3/.
Highlights
Next-generation sequencing (NGS) is a key cornerstone in precision medicine
One of the major drawbacks is that the genome index requires an enormous amount of RAM that is proportional to the length of the reference sequence
This leads to high RAM usage, and 64 GB RAM is required for the mapping of reads to a human genome in a multi-core system
Summary
Next-generation sequencing (NGS) is a key cornerstone in precision medicine. With the rapid decrease in the experimental cost of NGS, human whole-genome sequencing (WGS) at 30× depth can be performed at the cost of $700, and an mRNA sequencing (RNA-seq) costs only $80. Burrows–Wheeler transform (BWT)-based mapping algorithms such as the Burrows–Wheeler alignment (BWA) and Bowtie tools are the most widely used algorithms in NGS applications owing to their great advantages in speed compared to that of normal seed-based algorithms. They can map a human WGS dataset within 1 day in a server node (Hung and Weng 2017). Due to the sequencing error and deviation "between reads and reference sequences, BWT-based algorithms generally lose accuracy when the error rate exceeds 2%. Whole exome sequencing (WES) for 57 patients with genetic diseases failed to detect any Human Gene Mutation Database-cataloged
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have