A structure filter for the Eukaryotic Linear Motif Resource

Allegra Via,Christine Gemünd,Cathryn M Gould,Manuela Helmer-Citterich,Toby J Gibson

doi:10.1186/1471-2105-10-351

Abstract

BackgroundMany proteins are highly modular, being assembled from globular domains and segments of natively disordered polypeptides. Linear motifs, short sequence modules functioning independently of protein tertiary structure, are most abundant in natively disordered polypeptides but are also found in accessible parts of globular domains, such as exposed loops. The prediction of novel occurrences of known linear motifs attempts the difficult task of distinguishing functional matches from stochastically occurring non-functional matches. Although functionality can only be confirmed experimentally, confidence in a putative motif is increased if a motif exhibits attributes associated with functional instances such as occurrence in the correct taxonomic range, cellular compartment, conservation in homologues and accessibility to interacting partners. Several tools now use these attributes to classify putative motifs based on confidence of functionality.ResultsCurrent methods assessing motif accessibility do not consider much of the information available, either predicting accessibility from primary sequence or regarding any motif occurring in a globular region as low confidence. We present a method considering accessibility and secondary structural context derived from experimentally solved protein structures to rectify this situation. Putatively functional motif occurrences are mapped onto a representative domain, given that a high quality reference SCOP domain structure is available for the protein itself or a close relative. Candidate motifs can then be scored for solvent-accessibility and secondary structure context. The scores are calibrated on a benchmark set of experimentally verified motif instances compared with a set of random matches. A combined score yields 3-fold enrichment for functional motifs assigned to high confidence classifications and 2.5-fold enrichment for random motifs assigned to low confidence classifications. The structure filter is implemented as a pipeline with both a graphical interface via the ELM resource and through a Web Service protocol.ConclusionNew occurrences of known linear motifs require experimental validation as the bioinformatics tools currently have limited reliability. The ELM structure filter will aid users assessing candidate motifs presenting in globular structural regions. Most importantly, it will help users to decide whether to expend their valuable time and resources on experimental testing of interesting motif candidates.

Highlights

Many proteins are highly modular, being assembled from globular domains and segments of natively disordered polypeptides
The Eukaryotic Linear Motif (ELM) structure filter will aid users assessing candidate motifs presenting in globular structural regions
The ELM structure filter scoring scheme Structural analysis of true motif instances annotated in ELM supported what is expected from Linear Motifs (LM) biology [3], i.e. that they tend to lie on the surface of protein domains and prefer unstructured and loop regions (See below "Analysis of the ELM 3D benchmarking dataset")

Summary

Introduction

Many proteins are highly modular, being assembled from globular domains and segments of natively disordered polypeptides. In recent years it has become clear that proteins with highly modular architectures possess numerous short peptide motifs that are essential to their function [1,2,3,4,5] Such peptides are termed Linear Motifs (LM) as, in contrast to the globular domains, their function is independent of tertiary structure and encoded solely by the amino acid sequence. In order to deconvolute the functional components of modular protein architectures, it is necessary to identify the set of LMs as well as the folded components This is not straightforward because simple searches with short sequence patterns, known to act as functional modules, are uninformative - returning a flood of false positive matches. Sequence conservation has been shown to be effective in up-weighting true motifs relative to false positive matches [31,32,33]

Objectives

Methods

Results

Discussion

Conclusion