Abstract
A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc. To identify their ligands by experimental screening is not only labor-intensive but almost futile in screening low abundance species due to the suppression by high abundance species. An ideal way of studying protein-protein interactions is to use high throughput computational approaches to screen protein sequence databases to direct the validating experiments toward the most promising peptides. Predictors with only good cross-validation were not good enough to screen protein databases. In the current study we built integrated machine learning systems using three novel coding methods and screened the Swiss-Prot and GenBank protein databases for potential ligands of 10 SH3 and three PDZ domains. A large fraction of predictions has already been experimentally confirmed by other independent research groups, indicating a satisfying generalization capability for future applications in identifying protein interactions.
Highlights
A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc
Machine learning approaches like artificial neural network [11, 12] and support vector machine (SVM) [13, 14] have been used in predicting precision; ESP, estimated screening precision; MHC, major histocompatibility complex; MCC, Matthews correlation coefficient; BLU, Boehringer light unit
To illustrate the generalization capability of our method on different classes of ligands, we show the results of three domains as examples (Table IV)
Summary
A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc. To identify their ligands by experimental screening is labor-intensive but almost futile in screening low abundance species due to the suppression by high abundance species. A more plausible way of studying protein-protein interactions is by using high throughput computational predictions rather than experimental approaches to screen for interactions from protein sequence databases to direct the validating experiments toward the most promising peptides. Machine learning approaches like artificial neural network [11, 12] and support vector machine (SVM) [13, 14] have been used in predicting precision; ESP, estimated screening precision; MHC, major histocompatibility complex; MCC, Matthews correlation coefficient; BLU, Boehringer light unit
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.