Abstract
A fairly large set of protein interactions are mediated by families of peptide binding domains, such as SH2, SH3, PDZ, and MHC etc. To identify their ligands by experimental screening is not only labor intensive but almost futile in screening low abundant species, due to the suppression of high abundant species. The ideal way of studying protein-protein interactions is to use high-throughput computational approaches to screen protein sequence databases, direct the validating experiments towards the most promising peptides. Predictors with only good cross validation were not good enough to screen protein database. In this method, only information relevant to interaction was extracted; a family of domains and their ligands were collected and aligned respectively, then combined into a prediction system. An integrated machine learning systems was built using three novel coding methods, and screened the Swissprot and Genbank protein database for ligands of 10 SH3 and 3 PDZ domains. A large proportion of predictions have already been experimentally confirmed by other independent research groups, indicating a satisfying generalization capability in protein interaction identification.
Highlights
A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc
Machine learning approaches like artificial neural network [11, 12] and support vector machine (SVM) [13, 14] have been used in predicting precision; ESP, estimated screening precision; MHC, major histocompatibility complex; MCC, Matthews correlation coefficient; BLU, Boehringer light unit
To illustrate the generalization capability of our method on different classes of ligands, we show the results of three domains as examples (Table IV)
Summary
A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc. To identify their ligands by experimental screening is labor-intensive but almost futile in screening low abundance species due to the suppression by high abundance species. A more plausible way of studying protein-protein interactions is by using high throughput computational predictions rather than experimental approaches to screen for interactions from protein sequence databases to direct the validating experiments toward the most promising peptides. Machine learning approaches like artificial neural network [11, 12] and support vector machine (SVM) [13, 14] have been used in predicting precision; ESP, estimated screening precision; MHC, major histocompatibility complex; MCC, Matthews correlation coefficient; BLU, Boehringer light unit
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.