Comprehensive comparison of graph based multiple protein sequence alignment strategies

Ilya Plyusnin,Liisa Holm

doi:10.1186/1471-2105-13-64

Abstract

BackgroundAlignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different components of the alignment process and, thus, to investigate their contribution to the alignment quality and computation time. We systematically varied information sources, guiding trees, score transformations and iterative refinement options, and evaluated the resulting alignments on BAliBASE and SABmark.ResultsOur results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal.ConclusionsThis is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at http://ekhidna.biocenter.helsinki.fi/MMSA and as a supplementary file attached to this article (see Additional file 1).

Highlights

Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology
A variety of methods used by modern molecular biology such as structural modeling, function annotation, phylogenetic analysis and similarity searches are based on multiple protein sequence alignments (MPSA)
Our results suggest that single linkage clustering is optimal for this purpose regardless of the benchmark set

Summary

Introduction

Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. We present a novel MPSA program based on the SeqAn sequence alignment library. A variety of methods used by modern molecular biology such as structural modeling, function annotation, phylogenetic analysis and similarity searches are based on multiple protein sequence alignments (MPSA). MPSA provides position-specific information on evolutionary conserved characters, correlation between characters and their distribution. These features can be used in many further applications for which the quality of MPSA is, crucial [1]. The quality of the alignment is evaluated using a scoring function based on gap penalties and a substitution matrix. When two sequences are aligned, an exact solution can be found

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 29, 2012
Citations: 35	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Comprehensive comparison of graph based multiple protein sequence alignment strategies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Chapter 5 - Using Genetic Algorithms for Pairwise and Multiple Sequence Alignments
Cédric Notredame
Evolutionary Computation in Bioinformatics | VOL. -
Cédric NotredameCédric Notredame
01 Jan 2003
Evolutionary Computation in Bioinformatics | VOL. -

A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach
Luay Alawneh ... Mahmoud Al-Ayyoub
Cluster Computing | VOL. 23
Luay Alawneh, et. al.Luay Alawneh ... Mahmoud Al-Ayyoub
06 Jan 2020
Cluster Computing | VOL. 23

PROMALS3D: a tool for multiple protein sequence and structure alignments
Jimin Pei ... Bong-Hyun Kim
Nucleic Acids Research | VOL. 36
Jimin Pei, et. al.Jimin Pei ... Bong-Hyun Kim
20 Feb 2008
Nucleic Acids Research | VOL. 36

Score distributions of gapped multiple sequence alignments down to the low-probability tail.
Pascal Fieth ... Alexander K Hartmann
Physical review. E | VOL. 94
Pascal Fieth, et. al.Pascal Fieth ... Alexander K Hartmann
19 Aug 2016
Physical review. E | VOL. 94

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comprehensive comparison of graph based multiple protein sequence alignment strategies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics