An Integrated Machine Learning System to Computationally Screen Protein Databases for Protein Binding Peptide Ligands

Ling Zhang,Chen Shao,Dexian Zheng,Youhe Gao

doi:10.1074/mcp.m500346-mcp200

Abstract

A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc. To identify their ligands by experimental screening is not only labor-intensive but almost futile in screening low abundance species due to the suppression by high abundance species. An ideal way of studying protein-protein interactions is to use high throughput computational approaches to screen protein sequence databases to direct the validating experiments toward the most promising peptides. Predictors with only good cross-validation were not good enough to screen protein databases. In the current study we built integrated machine learning systems using three novel coding methods and screened the Swiss-Prot and GenBank protein databases for potential ligands of 10 SH3 and three PDZ domains. A large fraction of predictions has already been experimentally confirmed by other independent research groups, indicating a satisfying generalization capability for future applications in identifying protein interactions.

Highlights

A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc
Machine learning approaches like artificial neural network [11, 12] and support vector machine (SVM) [13, 14] have been used in predicting precision; ESP, estimated screening precision; MHC, major histocompatibility complex; MCC, Matthews correlation coefficient; BLU, Boehringer light unit
To illustrate the generalization capability of our method on different classes of ligands, we show the results of three domains as examples (Table IV)

Summary

Introduction

A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc. To identify their ligands by experimental screening is labor-intensive but almost futile in screening low abundance species due to the suppression by high abundance species. A more plausible way of studying protein-protein interactions is by using high throughput computational predictions rather than experimental approaches to screen for interactions from protein sequence databases to direct the validating experiments toward the most promising peptides. Machine learning approaches like artificial neural network [11, 12] and support vector machine (SVM) [13, 14] have been used in predicting precision; ESP, estimated screening precision; MHC, major histocompatibility complex; MCC, Matthews correlation coefficient; BLU, Boehringer light unit

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecular & Cellular Proteomics	Publication Date: Jul 1, 2006
Citations: 53	License type: cc-by

R Discovery Prime

R Discovery Prime

An Integrated Machine Learning System to Computationally Screen Protein Databases for Protein Binding Peptide Ligands

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular & Cellular Proteomics

Lead the way for us

Similar Papers

An integrated machine learning system to computationally screen protein databases for Protein binding peptide ligands.
Chen Shao ... Youhe Gao
The FASEB Journal | VOL. 20
Chen Shao, et. al.Chen Shao ... Youhe Gao
01 Mar 2006
The FASEB Journal | VOL. 20

PDZ Domains: Structural Modules for Protein Complex Assembly
Albert Y Hung ... Morgan Sheng
Journal of Biological Chemistry | VOL. 277
Albert Y Hung, et. al.Albert Y Hung ... Morgan Sheng
01 Feb 2002
Journal of Biological Chemistry | VOL. 277

Characterizing Binding Properties of Protein Interaction Domain
Y Gao
Journal of Proteomics & Bioinformatics | VOL. S2
Y GaoY Gao
01 Jul 2008
Journal of Proteomics & Bioinformatics | VOL. S2

Identification of VCP/p97, Carboxyl Terminus of Hsp70-interacting Protein (CHIP), and Amphiphysin II Interaction Partners Using Membrane-based Human Proteome Arrays
Gerlinde Grelle ... Albrecht Otto
Molecular & Cellular Proteomics | VOL. 5
Gerlinde Grelle, et. al.Gerlinde Grelle ... Albrecht Otto
07 Nov 2005
Molecular & Cellular Proteomics | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Integrated Machine Learning System to Computationally Screen Protein Databases for Protein Binding Peptide Ligands

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular & Cellular Proteomics