Short Read Alignment Based on Maximal Approximate Match Seeds.

Wei Quan,Yadong Wang,Dengfeng Guan,Guangri Quan,Bo Liu

doi:10.3389/fmolb.2020.572934

Abstract

Sequence alignment is a critical step in many critical genomic studies, such as variant calling, quantitative transcriptome analysis (RNA-seq), and metagenomic sequence classification. However, the alignment performance is largely affected by repetitive sequences in the reference genome, which extensively exist in species from bacteria to mammals. Aligning repeating sequences might lead to tremendous candidate locations, bringing about a challenging computational burden. Thus, most alignment tools prefer to simply discard highly repetitive seeds, but this may cause the true alignment to be missed. Using maximal approximate matches (MAMs) as seeds is an option, but MEMs seeds may fail due to sequencing errors or genomic variations in MEMs seeds. Here, we propose a novel sequence alignment algorithm, named MAM, which can efficiently align short DNA sequences. MAM first builds a modified Burrows-Wheeler transform (BWT) structure of a reference genome to accelerate approximate seed matching. Then, MAM uses maximal approximate matches (MAMs) seeds to reduce the candidate locations. Finally, MAM applies an affine-gap-penalty dynamic programming to extend MAMs seeds. Experimental results on simulated and real sequencing datasets show that MAM achieves better performance in speed than other state-of-the-art alignment tools. The source code is available at https://github.com/weiquan/mam.

Highlights

The development of next-generation sequencing (NGS) technologies has led to a rapid decline in the sequencing cost and had a tremendous impact on genomic research (Morozova and Marra, 2008; Reinert et al, 2015)
maximal approximate matches (MAMs) is distributed under the GNU General Public License (GPL)
All aligners were tested on two simulated datasets and two high-throughput sequencing (HTS) datasets to assess their speed, sensitivity, and accuracy

Summary

Introduction

The development of next-generation sequencing (NGS) technologies has led to a rapid decline in the sequencing cost and had a tremendous impact on genomic research (Morozova and Marra, 2008; Reinert et al, 2015). There has been an intense effort in recent years to develop computational methods and applications to meet the increasing demands for sequencing data analysis (Flicek and Birney, 2009). One of these fundamental tasks is sequence alignment. Many alignment methods have been proposed to improve the efficiency and accuracy of sequence alignment, including but not limited to Maq (Li et al, 2008a), SOAP (Li et al, 2008b), Bowtie (Langmead et al, 2009), BWA (Li and Durbin, 2009), and mrsFAST (Hach et al, 2010). Aligning repetitive DNA sequences accurately to the reference genome remains a major issue

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Short Read Alignment Based on Maximal Approximate Match Seeds.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in molecular biosciences

Lead the way for us

Journal: Frontiers in molecular biosciences	Publication Date: Nov 5, 2020
License type: CC BY 4.0

Similar Papers

Cancer genomics: new software tools making sequencing more accessible.
En-Guo Chen ... Yan Lu
Personalized Medicine | VOL. 11
En-Guo Chen, et. al.En-Guo Chen ... Yan Lu
01 Mar 2014
Personalized Medicine | VOL. 11

The Need for Speed and Energy Efficiency in Genome Analysis
Sachin Rawat
GEN Biotechnology | VOL. 2
Sachin RawatSachin Rawat
01 Jun 2023
GEN Biotechnology | VOL. 2

MSC: a metagenomic sequence classification algorithm
Subrata Saha ... Jethro Johnson
Bioinformatics | VOL. 35
Subrata Saha, et. al.Subrata Saha ... Jethro Johnson
14 Jan 2019
Bioinformatics | VOL. 35

Fast and SNP-aware short read alignment with SALT
Wei Quan ... Yadong Wang
BMC Bioinformatics | VOL. 22
Wei Quan, et. al.Wei Quan ... Yadong Wang
01 Aug 2021
BMC Bioinformatics | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Short Read Alignment Based on Maximal Approximate Match Seeds.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in molecular biosciences