Kalign – an accurate and fast multiple sequence alignment algorithm

Timo Lassmann,Erik Ll Sonnhammer

doi:10.1186/1471-2105-6-298

Timo Lassmann, Erik Ll Sonnhammer

Open Access

https://doi.org/10.1186/1471-2105-6-298

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2005
Citations: 682	License type: cc-by

Affiliation: Karolinska Institutet

Abstract

BackgroundThe alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics.ResultsWe developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods.ConclusionKalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences.

Highlights

The alignment of multiple protein sequences is a fundamental step in the analysis of biological data
We demonstrate that Kalign is well suited both in terms of speed and accuracy to deal with the challenges posed by large-scale comparative genomics
Only conserved blocks in the Balibase alignments were used for evaluation

Summary

Introduction

The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. In contrast to pairwise alignment, multiple sequence alignment (MSA) can reveal subtle similarities among large groups of proteins Such information can be used in phylogenetic analysis [2], function prediction [3], HMM building [4], finding consensus sequences and in the identification of residues critical to function. Global methods tend to outperform local methods when sequences are related over their entire length [14], while local methods are superior in multiple domain cases when sequences may only share one common domain [15] Since it is rarely known how sequences are related prior to the alignment, a method attempting to combine both local and global features was proposed by Notredame et al [16].

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Kalign – an accurate and fast multiple sequence alignment algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

An Improved Scoring Method for Protein Residue Conservation and Multiple Sequence Alignment
... Yi Pan
IEEE Transactions on NanoBioscience | VOL. 10
, et. al. ... Yi Pan
01 Dec 2011
IEEE Transactions on NanoBioscience | VOL. 10

PaMSA: A Parallel Algorithm for the Global Alignment of Multiple Protein Sequences
Irma R ... Arturo Chavoya
International Journal of Advanced Computer Science and Applications | VOL. 8
Irma R, et. al.Irma R ... Arturo Chavoya
01 Jan 2017
International Journal of Advanced Computer Science and Applications | VOL. 8

MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts
Xin Deng ... Jianlin Cheng
BMC Bioinformatics | VOL. 12
Xin Deng, et. al.Xin Deng ... Jianlin Cheng
01 Dec 2011
BMC Bioinformatics | VOL. 12

MARS: improving multiple circular sequence alignment using refined sequences
Lorraine A K Ayad ... Solon P Pissis
BMC Genomics | VOL. 18
Lorraine A K Ayad, et. al.Lorraine A K Ayad ... Solon P Pissis
14 Jan 2017
BMC Genomics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Kalign – an accurate and fast multiple sequence alignment algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics