Abstract

BackgroundShort linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very difficult to distinguish a randomly recurring “motif” from a truly over-represented one. Incorporating ambiguous amino acid positions and/or variable-length wildcard spacers between defined residues further complicates the matter.Methodology/Principal FindingsIn this paper we present two algorithms. SLiMBuild identifies convergently evolved, short motifs in a dataset of proteins. Motifs are built by combining dimers into longer patterns, retaining only those motifs occurring in a sufficient number of unrelated proteins. Motifs with fixed amino acid positions are identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. The algorithm is computationally efficient compared to alternatives, particularly when datasets include homologous proteins, and provides great flexibility in the nature of motifs returned. The SLiMChance algorithm estimates the probability of returned motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif. These algorithms are implemented in a software package, SLiMFinder. SLiMFinder default settings identify known SLiMs with 100% specificity, and have a low false discovery rate on random test data.Conclusions/SignificanceThe efficiency of SLiMBuild and low false discovery rate of SLiMChance make SLiMFinder highly suited to high throughput motif discovery and individual high quality analyses alike. Examples of such analyses on real biological data, and how SLiMFinder results can help direct future discoveries, are provided. SLiMFinder is freely available for download under a GNU license from http://bioinformatics.ucd.ie/shields/software/slimfinder/.

Highlights

  • Protein-protein interactions are of fundamental importance in biology

  • The SLiMChance algorithm we present here improves on these scores by making a crude but effective adjustment of motif probabilities by considering the total number of motifs in the motif-space considered by SLiMBuild

  • Random data matches the calculated expectation quite closely, with approximately 10% of datasets yielding a significance of 0.1 or lower and 1% of datasets yielding a significance of 0.01 or lower. This relationship begins to deviate as the p-value increases, this is not of concern as these deviations occur within the non-significant portion of the data and will not impact on results

Read more

Summary

Introduction

Protein-protein interactions are of fundamental importance in biology. many well-characterised interactions are mediated by large domain-domain interfaces, it is estimated that 15%–40% of interactions may be mediated by a short, linear motif (SLiM) in one of the binding partners [1,2]. Existing methods for identifying new SLiMs [7,8] explicitly invoke a model of convergent evolution to identify over-represented sequence patterns These methods, rely on an initial motif discovery phase using generic pattern-finding TEIRESIAS software [9], which returns all shared patterns regardless of evolutionary relationships and with only crude length and complexity control. The SLiMChance algorithm estimates the probability of returned motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif. The efficiency of SLiMBuild and low false discovery rate of SLiMChance make SLiMFinder highly suited to high throughput motif discovery and individual high quality analyses alike Examples of such analyses on real biological data, and how SLiMFinder results can help direct future discoveries, are provided. SLiMFinder is freely available for download under a GNU license from http://bioinformatics.ucd.ie/shields/software/slimfinder/

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call