MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Kirill Kryukov,Naruya Saitou

doi:10.1186/1471-2105-11-142

Abstract

BackgroundLarge nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.ResultsWe present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.ConclusionsMISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species Helicobacter pylori (about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.

Highlights

Large nucleotide sequence datasets are becoming increasingly common objects of comparison
We found that knowing the number of occurrences of each k-tuple in the original sequence dataset is not enough to efficiently decide which k-tuples are more likely to represent the local homology
Human mtDNA genomes Complete human mitochondrial DNA genome sequences were used as an example of very closely related sequences

Summary

Results

We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences

Conclusions

Background

Results and Discussion

Method

Conclusion

Gotoh O

15. Li KB: ClustalW-MPI

23. Kimura M

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Mar 18, 2010
Citations: 41	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

A novel fast multiple nucleotide sequence alignment method based on FM-index
Huan Liu ... Quan Zou
Briefings in bioinformatics | VOL. 23
Huan Liu, et. al.Huan Liu ... Quan Zou
10 Dec 2021
Briefings in bioinformatics | VOL. 23

MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts
Xin Deng ... Jianlin Cheng
BMC bioinformatics | VOL. 12
Xin Deng, et. al.Xin Deng ... Jianlin Cheng
01 Dec 2011
BMC bioinformatics | VOL. 12

Multiple protein sequence alignment using double-dynamic programming
William R Taylor ... Ingvar Eidhammer
Computers & Chemistry | VOL. 24
William R Taylor, et. al.William R Taylor ... Ingvar Eidhammer
01 Jan 1999
Computers & Chemistry | VOL. 24

Multiple protein sequence alignment using double-dynamic programming.
W Taylor
Computers & Chemistry | VOL. 24
W TaylorW Taylor
01 Jan 1999
Computers & Chemistry | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MISHIMA - a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics