A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences

Claudia Chica,Toby J Gibson,Rodrigo López,Alberto Labarga,Cathryn M Gould

doi:10.1186/1471-2105-9-229

Claudia Chica, Toby J Gibson + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-9-229

Copy DOI

Journal: BMC bioinformatics	Publication Date: May 6, 2008
Citations: 84	License type: cc-by

Affiliation: European Bioinformatics Institute

Abstract

BackgroundThe structure of many eukaryotic cell regulatory proteins is highly modular. They are assembled from globular domains, segments of natively disordered polypeptides and short linear motifs. The latter are involved in protein interactions and formation of regulatory complexes. The function of such proteins, which may be difficult to define, is the aggregate of the subfunctions of the modules. It is therefore desirable to efficiently predict linear motifs with some degree of accuracy, yet sequence database searches return results that are not significant.ResultsWe have developed a method for scoring the conservation of linear motif instances. It requires only primary sequence-derived information (e.g. multiple alignment and sequence tree) and takes into account the degenerate nature of linear motif patterns. On our benchmarking, the method accurately scores 86% of the known positive instances, while distinguishing them from random matches in 78% of the cases. The conservation score is implemented as a real time application designed to be integrated into other tools. It is currently accessible via a Web Service or through a graphical interface.ConclusionThe conservation score improves the prediction of linear motifs, by discarding those matches that are unlikely to be functional because they have not been conserved during the evolution of the protein sequences. It is especially useful for instances in non-structured regions of the proteins, where a domain masking filtering strategy is not applicable.

Highlights

The structure of many eukaryotic cell regulatory proteins is highly modular
The conservation score improves the prediction of linear motifs, by discarding those matches that are unlikely to be functional because they have not been conserved during the evolution of the protein sequences
It illustrates the Web Service implementation that takes as input a protein sequence and gives as output the list of all the predicted instances with their positions and conservation score (CS)

Summary

Introduction

The structure of many eukaryotic cell regulatory proteins is highly modular They are assembled from globular domains, segments of natively disordered polypeptides and short linear motifs. The latter are involved in protein interactions and formation of regulatory complexes. The function of such proteins, which may be difficult to define, is the aggregate of the subfunctions of the modules. Linear motifs (LM) are short (3–10) amino acid sequences involved in numerous interactions including the modification-based regulation of protein function [1]. Prediction of new LM instances by pattern matching is prone to produce a high proportion of false positives

Methods

Results

Discussion

Conclusion