SALT: a fast, memory-efficient and SNP-aware short read alignment tool

Wei Quan,Bo Liu,Yadong Wang

doi:10.1109/bibm47256.2019.8983162

Abstract

DNA sequence alignment tools play an essential role in genomics and genetics. The accuracy of the alignment directly affects the accuracy of downstream analysis, such as variant calling, so it is essential to map reads to the reference genome rapidly and accurately. It has become an essential topic in the field of bioinformatics. Conventional read aligners map reads to a linear reference genome (such as GRCh38 primary). However, the linear reference genome only represents one or a few individuals of genomes, which lacks the variation information in population. It can introduce bias and impact sensitivity and accuracy of mapping. Recently, a few aligners are beginning to map reads to a graph that captures the entire human genome along with a large number of variants. However, compared to linear reference aligners, storing and indexing all genetic variants require costly memory(RAM) space and make extremely long runtime. Aligning reads to a graph model-based index, including the whole set of variants, is ultimately an NP-hard problem in theory. Considering only SNPs information will reduce the complexity of index and improve the speed of alignments. Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates the SNP database. The SALT is benchmarked both on simulated reads and the real dataset. The results demonstrate that SALT can efficiently map reads to the reference genome, and significantly improve accuracy and sensitivity. Read alignment incorporating SNPs information can improve the sensitivity and accuracy of the read alignment. Moreover, it helps to discover novel variants. SALT is distributed under the GNU General Public License (GPL). Source code is freely available at https://github.com/weiquan/SALT.

Full Text