A statistical score for assessing the quality of multiple sequence alignments

Virpi Ahola,Mauno Vihinen,Tero Aittokallio,Esa Uusipaikka

doi:10.1186/1471-2105-7-484

Abstract

BackgroundMultiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc. Although the automatic construction of a multiple sequence alignment for a set of remotely related sequences cause a very challenging and error-prone task, many downstream analyses still rely heavily on the accuracy of the alignments.ResultsTo address the need for an objective evaluation framework, we introduce a statistical score that assesses the quality of a given multiple sequence alignment. The quality assessment is based on counting the number of significantly conserved positions in the alignment using importance sampling method in conjunction with statistical profile analysis framework. We first evaluate a novel objective function used in the alignment quality score for measuring the positional conservation. The results for the Src homology 2 (SH2) domain, Ras-like proteins, peptidase M13, subtilase and β-lactamase families demonstrate that the score can distinguish sequence patterns with different degrees of conservation. Secondly, we evaluate the quality of the alignments produced by several widely used multiple sequence alignment programs using a novel alignment quality score and a commonly used sum of pairs method. According to these results, the Mafft strategy L-INS-i outperforms the other methods, although the difference between the Probcons, TCoffee and Muscle is mostly insignificant. The novel alignment quality score provides similar results than the sum of pairs method.ConclusionThe results indicate that the proposed statistical score is useful in assessing the quality of multiple sequence alignments.

Highlights

Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc
The results indicate that the proposed statistical score is useful in assessing the quality of multiple sequence alignments
Evaluating the maxZ score for positional conservation we study the practical performance of the maxZ score in Src homology 2 (SH2) domain, Ras-like proteins, peptidase M13, subtilase and β-lactamase familes

Summary

Introduction

Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc. The automatic construction of a multiple sequence alignment for a set of remotely related sequences cause a very challenging and error-prone task, many downstream analyses still rely heavily on the accuracy of the alignments. The results of annotation of gene/protein sequences, prediction of protein structures or building of phylogenetic trees, for instance, are critically dependent on the quality of the given alignment. BMC Bioinformatics 2006, 7:484 http://www.biomedcentral.com/1471-2105/7/484 nized that the automatic construction of a multiple sequence alignment for a set of remotely related sequences can be a very demanding task. Additional requirements for a good conservation score include the possibility to incorporate (iv) the effect of gaps and (v) sequence weighting into (vi) a simple scoring strategy

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 3, 2006
Citations: 93	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A statistical score for assessing the quality of multiple sequence alignments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark
Jacek Błażewicz ... Piotr Formanowicz
International Journal of Applied Mathematics and Computer Science | VOL. 19
Jacek Błażewicz, et. al.Jacek Błażewicz ... Piotr Formanowicz
01 Dec 2009
International Journal of Applied Mathematics and Computer Science | VOL. 19

A weighting system and algorithm for aligning many phylogenetically related sequences.
Osamu Gotoh
Bioinformatics | VOL. 11
Osamu GotohOsamu Gotoh
01 Jan 1995
Bioinformatics | VOL. 11

Assessing Multiple Sequence Alignments Using Visual Tools
Catherine L. ... Etsuko N.
-
Catherine L., et. al.Catherine L. ... Etsuko N.
02 Nov 2011
02 Nov 2011

No so HoT - heads or tails is not able to reliably compare multiple sequence alignments.
Michael J Wise
Cladistics : the international journal of the Willi Hennig Society | VOL. 26
Michael J WiseMichael J Wise
11 Nov 2009
Cladistics : the international journal of the Willi Hennig Society | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A statistical score for assessing the quality of multiple sequence alignments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics