Multiple Sequence Alignment Based on a Suffix Tree and Center-Star Strategy: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework.

Wenhe Su,Shaoliang Peng,Quan Zou,Yutong Lu,Xiangke Liao

doi:10.1089/cmb.2017.0040

Abstract

Multiple sequence alignment (MSA) is an essential prerequisite and dominant method to deduce the biological facts from a set of molecular biological sequences. It refers to a series of algorithmic solutions for the alignment of evolutionarily related sequences while taking into account evolutionary events such as mutations, insertions, deletions, and rearrangements under certain conditions. These methods can be applied to DNA, RNA, or protein sequences. In this work, we take advantage of a center-star strategy to reduce the MSA problem to pairwise alignments, and we use a suffix tree to match identical substrings between two pairwise sequences. Multiple sequence alignment based on a suffix tree and center-star strategy (MASC) can accomplish MSA in O(mn), which is linear time complexity, where m is the number of sequences and n is the average length of sequences. Furthermore, we execute our method on the Spark-distributed parallel framework to deal with ever-increasing massive data sets. Our method is significantly faster than previous techniques, with no loss in accuracy for highly similar nucleotide sequences like homologous sequences, which we experimentally demonstrate. Comparing with mainstream MSA tools (e.g., MAFFT), MASC could finish the alignment of 67,200 sequences, longer than 10,000 bps, in 9 minutes, which takes MAFFT >3.5 days.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multiple Sequence Alignment Based on a Suffix Tree and Center-Star Strategy: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework.

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology

Lead the way for us

Journal: Journal of computational biology : a journal of computational molecular cell biology	Publication Date: Nov 8, 2017
Citations: 19

Similar Papers

MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts
Xin Deng ... Jianlin Cheng
BMC Bioinformatics | VOL. 12
Xin Deng, et. al.Xin Deng ... Jianlin Cheng
01 Dec 2011
BMC Bioinformatics | VOL. 12

A novel fast multiple nucleotide sequence alignment method based on FM-index
Huan Liu ... Quan Zou
Briefings in Bioinformatics | VOL. 23
Huan Liu, et. al.Huan Liu ... Quan Zou
10 Dec 2021
Briefings in Bioinformatics | VOL. 23

A hybrid algorithm for multiple DNA sequence alignment
Kokila K Perera ... C Thusangi Wannige
-
Kokila K Perera, et. al.Kokila K Perera ... C Thusangi Wannige
01 Sep 2016
01 Sep 2016

MARS: improving multiple circular sequence alignment using refined sequences
Lorraine A K Ayad ... Solon P Pissis
BMC Genomics | VOL. 18
Lorraine A K Ayad, et. al.Lorraine A K Ayad ... Solon P Pissis
14 Jan 2017
BMC Genomics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiple Sequence Alignment Based on a Suffix Tree and Center-Star Strategy: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework.

Abstract

Talk to us

Similar Papers

More From: Journal of computational biology : a journal of computational molecular cell biology