Abstract

BackgroundThe analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts – portions of the chains that come to close distance in folded structural ensembles. Here we introduce a direct coupling analysis for repeat proteins – natural systems for which the identification of folding domains remains challenging.ResultsWe show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias in an objective way reveals true co-evolutionary signals from which local native contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families.ConclusionsThe overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0648-3) contains supplementary material, which is available to authorized users.

Highlights

  • The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts – portions of the chains that come to close distance in folded structural ensembles

  • Direct coupling analysis of repeat proteins We obtained sequences of single repeated units for the families listed in Additional file 1: Table S1 from the PFAM database, version 27.0 [32]

  • One could be tempted to disregard the results for the i, i + L0 positions, arguing that these are caused by the repetitive nature of the system

Read more

Summary

Introduction

The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts – portions of the chains that come to close distance in folded structural ensembles. The fact that many protein molecules spontaneously collapse stretches of amino acid chains into defined structural domains [1] facilitates the description, evolution and construction of these peculiar physical objects. Repeat proteins represent close to 6 % of polipeptide sequences codified in eukaryotic genomes [7]. These have been broadly classified in groups according to the length of the minimal repeating units [8]. There are open problems of quantitatively defining the repeat protein families, the number and location of the repeat occurrences and the grouping of these into repeat-arrays

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.