Abstract

MotivationProtein–protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies.ResultsPPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤0.97, outperforming go2ppi—a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations.Availability and implementation https://github.com/ima23/maxent-ppi Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Despite their structural diversity, proteins only achieve full potential by direct interaction in multi-protein complexes involved in fundamental biological processes such as gene expression, cell differentiation and cell–cell communication (Alberts, 1998; Bonetta, 2010; Vidal et al, 2011).Protein interactions have been studied by low-throughput assays and associated analytical methods, including x-ray crystallography (Scott et al, 2009), nuclear magnetic resonance (NMR) and surface plasmon resonance (SPR), fluorescence resonance energy transfer (FRET) and isothermal titration calorimetry (ITC)

  • The GIS-MaxEnt and support vector machines (SVMs) models’ performance was assessed on a D.melanogaster training set composed of 500 positive examples and 500 negative examples described by 224 629 annotations based on InterPro and Gene Ontology (GO) annotation terms

  • 3.1.1 GIS-MaxEnt applied on different annotation sets The GIS-MaxEnt based model trained on the four individual data sources performed best when trained on the GO cellular component having a Matthews correlation coefficient (MCC) of 0.83 with the lowest performance being present for the one trained on biological process with a MCC of 0.56 (Fig. 3 and Supplementary Table S6)

Read more

Summary

Introduction

Protein interactions have been studied by low-throughput assays and associated analytical methods, including x-ray crystallography (Scott et al, 2009), nuclear magnetic resonance (NMR) and surface plasmon resonance (SPR), fluorescence resonance energy transfer (FRET) and isothermal titration calorimetry (ITC). Such methods are reviewed in (Collins and Choudhary, 2008; Shoemaker and Panchenko, 2007). Several mass spectrometry methods have more recently been used to interrogate protein interactions in multi protein complexes (Smits and Vermeulen, 2016) These structural proteomics approaches, including native mass spectrometry (Mehmood et al, 2015), and crosslinking mass spectrometry (Liu et al, 2015), nicely complement high-resolution cryo-electron microscopy (Huis In ’t Veld et al, 2014).

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.