A new protein linear motif benchmark for multiple sequence alignment software

Emmanuel Perrodou,Olivier Poch,Toby J Gibson,Claudia Chica,Julie D Thompson

doi:10.1186/1471-2105-9-213

Abstract

BackgroundLinear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs.ResultsWe have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases.ConclusionWe have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.

Highlights

Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins
We have shown that none of the programs currently available is capable of reliably aligning linear motifs (LMs) in distantly related sequences and we have highlighted a number of specific problems
These functional sites are identified by patterns (ELM regular expressions) that are similar to PROSITE patterns [27]

Summary

Introduction

Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. Less research has been directed towards the analysis of the large segments of multidomain proteins that are non-globular, intrinsically lacking the capability to fold into a defined tertiary structure [4,5] Sometimes such regions may act as linkers connecting globular domains and in this case, the sequence of amino acids is not critical to function. Very often, these unstructured regions contain important functional sites such as protein interaction sites, cell compartment targeting signals, post-translational modification sites or cleavage sites. Given the fundamental roles these motifs play in cell regulation and signalling, identifying these motifs will be of crucial importance in many biological disciplines

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 25, 2008
Citations: 71	License type: cc-by

R Discovery Prime

R Discovery Prime

A new protein linear motif benchmark for multiple sequence alignment software

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

The construction and use of log-odds substitution scores for multiple sequence alignment.
Stephen F Altschul ... Elena Zaslavsky
PLoS Computational Biology | VOL. 6
Stephen F Altschul, et. al.Stephen F Altschul ... Elena Zaslavsky
15 Jul 2010
PLoS Computational Biology | VOL. 6

An Optimal Mesh Algorithm for Remote Protein Homology Detection
Firdaus M Abdullah ... Rathiah Hashim
-
Firdaus M Abdullah, et. al.Firdaus M Abdullah ... Rathiah Hashim
01 Jan 2010
01 Jan 2010

A multiple sequence alignment algorithm for homologous proteins using secondary structure information and optionally keying alignments to functionally important sites.
Christina M Henneke
Computer applications in the biosciences : CABIOS | VOL. 5
Christina M HennekeChristina M Henneke
01 Jan 1989
Computer applications in the biosciences : CABIOS | VOL. 5

Multiple sequence alignment by a pairwise algorithm.
William Ramsay Taylor
Computer applications in the biosciences : CABIOS | VOL. 3
William Ramsay TaylorWilliam Ramsay Taylor
01 Jan 1987
Computer applications in the biosciences : CABIOS | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new protein linear motif benchmark for multiple sequence alignment software

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics