Abstract

Predicting protein–protein interactions (PPI) represents an important challenge in structural bioinformatics. Current computational methods display different degrees of accuracy when predicting these interactions. Different factors were proposed to help improve these predictions, including choosing the proper descriptors of proteins to represent these interactions, among others. In the current work, we provide a representative protein structure that is amenable to PPI classification using machine learning approaches, referred to as residue cluster classes. Through sampling and optimization, we identified the best algorithm–parameter pair to classify PPI from more than 360 different training sets. We tested these classifiers against PPI datasets that were not included in the training set but shared sequence similarity with proteins in the training set to reproduce the situation of most proteins sharing sequence similarity with others. We identified a model with almost no PPI error (96–99% of correctly classified instances) and showed that residue cluster classes of protein pairs displayed a distinct pattern between positive and negative protein interactions. Our results indicated that residue cluster classes are structural features relevant to model PPI and provide a novel tool to mathematically model the protein structure/function relationship.

Highlights

  • Proteins perform many vital functions in living organisms, with most depending on interactions with other molecules

  • Determined positive examples of protein–protein interactions (PPI) were obtained from the three-dimensional interacting domains (3DID) database [22] and negative ones from the Negatome database [23]; only proteins in these sets with 3D structures reported in the public repository of protein structures, Protein Data Bank (PDB), were included in this study

  • We generated a complementary testing set that did not share the same positive PPI pair included in the training set, but included the same protein family (PFAM) domains and negative PPI set

Read more

Summary

Introduction

Proteins perform many vital functions in living organisms, with most depending on interactions with other molecules Among these interactions, protein–protein interactions (PPI) are involved in maintaining cellular structure, regulating protein function, facilitating cellular transport, and encoding for the scaffold where most, if not all, cellular events take place [1,2,3]. Protein–protein interactions (PPI) are involved in maintaining cellular structure, regulating protein function, facilitating cellular transport, and encoding for the scaffold where most, if not all, cellular events take place [1,2,3] Identifying these PPI represent an important effort to characterize the molecular mechanisms at play in different living organisms. Other descriptors, such as proteins sequence composition [13,14], genomic data [15,16], and protein three-dimensional (3D) structures [17,18], among others [19,20], were described to represent proteins to predict PPI

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call