Low-complexity regions within protein sequences have position-dependent roles

Alain Coletta,Steve R Pettifer,James Marsh,Teresa K Attwood,David Y Weiss Solís,John W Pinney

doi:10.1186/1752-0509-4-43

Abstract

BackgroundRegions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis.ResultsIn keeping with previous results, we found that LCR-containing proteins tend to have more binding partners across different PPI networks than proteins that have no LCRs. More specifically, our study suggests i) that LCRs are preferentially positioned towards the protein sequence extremities and, in contrast with centrally-located LCRs, such terminal LCRs show a correlation between their lengths and degrees of connectivity, and ii) that centrally-located LCRs are enriched with transcription-related GO terms, while terminal LCRs are enriched with translation and stress response-related terms.ConclusionsOur results suggest not only that LCRs may be involved in flexible binding associated with specific functions, but also that their positions within a sequence may be important in determining both their binding properties and their biological roles.

Highlights

Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe
The FYI [19] is generated as the union of: Yeast twohybrid experiments [23,24,25], datasets produced from affinity purification and mass spectrometry screens [26,27], one dataset produced from in silico computational prediction methods [28], the physical protein-protein interactions, excluding interactions from genomescale experiments, from the Munich Information Center for Protein Sequences (MIPS) [29] Comprehensive Yeast Genome Database (CYGD) dataset [30], and the CYGD protein complexes published in the literature
We have shown the length of Low-complexity regions (LCRs) to be positively correlated with the number of binding partners, but only in the sequence extremities

Summary

Introduction

Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Low-complexity regions (LCRs) in protein sequences are regions containing little diversity in their amino acid composition. This work defines LCRs computationally as an amino acid sequence with low information content (see methods).

Methods

Results

Conclusion