Abstract

Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes as regular expressions specifying motif length and the allowed amino acids at each motif position. We measure motif specificity for a pair of motif classes by quantifying how many motif-discriminating positions prevent a protein subsequence from matching the two classes at once. We derive theorems for the maximal number of motif classes that can simultaneously maintain a certain number of motif-discriminating positions between all pairs of classes in the motif universe, for a given amino acid alphabet. We also calculate the fraction of all protein subsequences that would belong to a motif class if all potential motif classes came into existence. Naturally occurring pairs of motif classes present most often a single motif-discriminating position. This mild specificity maximizes the potential number of coexisting motif classes, the expansion of the motif universe due to amino acid modifications and the fraction of amino acid sequences that code for a motif instance. As a result, thousands of linear motif classes may remain undiscovered.

Highlights

  • Natural proteins are synthesized as linear polymers from an alphabet of twenty amino acids, which may later be expanded through post-translational modifications

  • Are our results affected by biases in the eukaryotic linear motifs (ELMs) database and the use of regular expressions? Our results may be affected by several caveats

  • The ELM database is an incomplete sample of the existing motif classes

Read more

Summary

Introduction

Natural proteins are synthesized as linear polymers from an alphabet of twenty amino acids, which may later be expanded through post-translational modifications. The proteome is the entire set of proteins that is, or potentially could be, expressed by an organism. Proteins present remarkable physicochemical properties that are strongly linked to the biological processes they partake in and can, in some cases, be assigned to a defined region of its sequence, such as for enzyme catalysis or folding into globular domains. Decision to publish, or preparation of the manuscript

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.