Fast and SNP-aware short read alignment with SALT

Wei Quan,Bo Liu,Yadong Wang

doi:10.1186/s12859-021-04088-6

Wei Quan, Bo Liu + Show 1 more

Open Access

PDF Available

https://doi.org/10.1186/s12859-021-04088-6

Copy DOI

Export

Save

Cite

Journal: BMC Bioinformatics	Publication Date: Aug 1, 2021
Citations: 1	License type: open-access

Affiliation: Harbin Institute of Technology

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundDNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore, rapidly and accurately mapping reads to a reference genome is a significant topic in bioinformatics. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly). However, such a linear reference genome represents the genome of only one or a few individuals and thus lacks information on variations in the population. This limitation can introduce bias and impact the sensitivity and accuracy of mapping. Recently, a number of aligners have begun to map reads to populations of genomes, which can be represented by a reference genome and a large number of genetic variants. However, compared to linear reference aligners, an aligner that can store and index all genetic variants has a high cost in memory (RAM) space and leads to extremely long run time. Aligning reads to a graph-model-based index that includes all types of variants is ultimately an NP-hard problem in theory. By contrast, considering only single nucleotide polymorphism (SNP) information will reduce the complexity of the index and improve the speed of sequence alignment.ResultsThe SNP-aware alignment tool (SALT) is a fast, memory-efficient, and SNP-aware short read alignment tool. SALT uses 5.8 GB of RAM to index a human reference genome (GRCh38) and incorporates 12.8M UCSC common SNPs. Compared with a state-of-the-art aligner, SALT has a similar speed but higher accuracy.ConclusionsHerein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates an SNP database. We benchmarked SALT using simulated and real datasets. The results demonstrate that SALT can efficiently map reads to the reference genome with significantly improved accuracy. Incorporating SNP information can improve the accuracy of read alignment and can reveal novel variants. The source code is freely available at https://github.com/weiquan/SALT.

Highlights

DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies
single nucleotide polymorphisms (SNPs)-aware alignment tool (SALT) is distributed under the GNU General Public License (GPL)
The aligners were tested on two simulated datasets and two high-throughput sequencing (HTS) datasets to assess their speed, sensitivity, and accuracy

Summary

Introduction

DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly) Such a linear reference genome represents the genome of only one or a few individuals and lacks information on variations in the population. Short read alignment is a common first step of various downstream analyses, such as variant calling [2], RNA abundance quantification [3], and expression quantitative trait locus (eQTL) analysis [4] It plays a critical role in medical and population genetics. Conventional aligners map sequencing reads to a linear reference genome, which represents one or a few individuals Such a linear reference genome lacks information on the variation in the population and does not reflect the genetic diversity of individuals. Augmenting the reference genome with known genetic variants can reduce the genetic distance between the donor and reference genomes and avoid allelic bias [5]

Methods

Results

Discussion

Conclusion