Abstract

Decrypting the interface residues of the protein complexes provides insight into the functions of the proteins and, hence, the overall cellular machinery. Computational methods have been devised in the past to predict the interface residues using amino acid sequence information, but all these methods have been majorly applied to predict for prokaryotic protein complexes. Since the composition and rate of evolution of the primary sequence is different between prokaryotes and eukaryotes, it is important to develop a method specifically for eukaryotic complexes. Here, we report a new hybrid pipeline for predicting the protein-protein interaction interfaces in a pairwise manner from the amino acid sequence information of the interacting proteins. It is based on the framework of Co-evolution, machine learning (Random Forest), and Network Analysis named CoRNeA trained specifically on eukaryotic protein complexes. We use Co-evolution, physicochemical properties, and contact potential as major group of features to train the Random Forest classifier. We also incorporate the intra-contact information of the individual proteins to eliminate false positives from the predictions keeping in mind that the amino acid sequence of a protein also holds information for its own folding and not only the interface propensities. Our prediction on example datasets shows that CoRNeA not only enhances the prediction of true interface residues but also reduces false positive rates significantly.

Highlights

  • The biological machinery performs its cellular functions when its basic units, such as DNA, RNA, and proteins, interact with each other

  • The other features derived for the Random Forest classifier are based on the physicochemical properties of the amino acids which depend on their side chain structure, such as charge, size and hydrophobe compatibility, secondary structure information, and relative solvent accessibility, were derived using amino acid sequence information

  • Random Forest classifier is a tree-structure based algorithm where the classification rules are learned based on the feature values and their target class provided while training

Read more

Summary

Introduction

The biological machinery performs its cellular functions when its basic units, such as DNA, RNA, and proteins, interact with each other. There are various experimental methods known for examining these interactions such as yeast two hybrid (Y2H) [1], co-immunoprecipitation (co-IP) [2], mass spectrometry [3], etc., which provide information only about the domains necessary for maintaining the interaction or the proximity of the interactions. These methods are labor, cost and time intensive. Deciphering the PPII (Protein-Protein Interaction Interfaces) at the highest resolution through x-ray crystallography or cryo-electron microscopy methods is even more challenging due to their intrinsic technical difficulties

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.