Abstract

Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http://tinyurl.com/motifhound) together with the benchmark that can be used as a reference to assess future developments in motif discovery.

Highlights

  • Linear motifs in proteins play key roles in molecular recognition [1,2,3]

  • All these algorithms are below in the identification of linear motifs experimentally determined to bind to the FUS1 SH3 domain from S. cerevisiae

  • We found that the motifs predicted by FIRE-Pro, SLiMFinder, qPMS7, and MotifHound exhibit the largest overlap with experimentally characterized binding sites, with an advantage to MotifHound

Read more

Summary

Introduction

Linear motifs in proteins play key roles in molecular recognition [1,2,3]. They mediate diverse functions including ion-coordination [4], protein localization [2,5], protein cleavage [2], protein assembly through scaffolding [1,2,6,7], protein post-translational modifications [2,5], or more generally signal transduction [1]. Linear motifs are typically 3 to 10 amino acids long, though only few residues (,1/ 3) are generally conserved due to their importance in motif recognition [10,11]. Such short length and degenerate nature make their discovery a difficult problem, yet, their functional importance and widespread nature stresses the need for methods to help in their ab-initio discovery

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.