Abstract

BackgroundPDZ domains are one of the most promiscuous protein recognition modules that bind with short linear peptides and play an important role in cellular signaling. Recently, few high-throughput techniques (e.g. protein microarray screen, phage display) have been applied to determine in-vitro binding specificity of PDZ domains. Currently, many computational methods are available to predict PDZ-peptide interactions but they often provide domain specific models and/or have a limited domain coverage.ResultsHere, we composed the largest set of PDZ domains derived from human, mouse, fly and worm proteomes and defined binding models for PDZ domain families to improve the domain coverage and prediction specificity. For that purpose, we first identified a novel set of 138 PDZ families, comprising of 548 PDZ domains from aforementioned organisms, based on efficient clustering according to their sequence identity. For 43 PDZ families, covering 226 PDZ domains with available interaction data, we built specialized models using a support vector machine approach. The advantage of family-wise models is that they can also be used to determine the binding specificity of a newly characterized PDZ domain with sufficient sequence identity to the known families. Since most current experimental approaches provide only positive data, we have to cope with the class imbalance problem. Thus, to enrich the negative class, we introduced a powerful semi-supervised technique to generate high confidence non-interaction data. We report competitive predictive performance with respect to state-of-the-art approaches.ConclusionsOur approach has several contributions. First, we show that domain coverage can be increased by applying accurate clustering technique. Second, we developed an approach based on a semi-supervised strategy to get high confidence negative data. Third, we allowed high order correlations between the amino acid positions in the binding peptides. Fourth, our method is general enough and will easily be applicable to other peptide recognition modules such as SH2 domains and finally, we performed a genome-wide prediction for 101 human and 102 mouse PDZ domains and uncovered novel interactions with biological relevance. We make all the predictive models and genome-wide predictions freely available to the scientific community.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-S1-S5) contains supplementary material, which is available to authorized users.

Highlights

  • PDZ domains are one of the most promiscuous protein recognition modules that bind with short linear peptides and play an important role in cellular signaling

  • We clustered all the similar PDZ domains based on their sequence identity by using Markov clustering algorithm (MCL) [32]

  • MCL is a fast and powerful algorithm for clustering biological sequences. 50% sequence identity was set for the cutoff value as previous research showed that the PDZ domains with more than 50% sequence identity have similar binding specificity [27]

Read more

Summary

Introduction

PDZ domains are one of the most promiscuous protein recognition modules that bind with short linear peptides and play an important role in cellular signaling. PDZ domains are one of the most widespread peptide recognition modules (PRMs) that predominantly found in signaling proteins in multi-cellular organisms and play an important role in the establishment of cell polarity, neuronal signaling, protein trafficking etc [1,2,3]. PDZ domains were grouped into different specificity classes based on their target motif structures: X[T/S]X-COOH (Class I motif), XX-COOH (Class II motif) and a minor X[D/E]X-COOH (Class III motif), where X represents any natural amino acid and represents hydrophobic amino acid [9,12]. This classification system is an oversimplification since it is known that every residue in the target peptide contributes to the binding specificity [13,14]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call