Abstract

Protein-protein interactions (PPIs) play key roles in life processes, such as signal transduction, transcription regulations, and immune response, etc. Identification of PPIs enables better understanding of the functional networks within a cell. Common experimental methods for identifying PPIs are time consuming and expensive. However, recent developments in computational approaches for inferring PPIs from protein sequences based on coevolution theory avoid these problems. In the coevolution theory model, interacted proteins may show coevolutionary mutations and have similar phylogenetic trees. The existing coevolution methods depend on multiple sequence alignments (MSA); however, the MSA-based coevolution methods often produce high false positive interactions. In this paper, we present a computational method using an alignment-free approach to accurately detect PPIs and reduce false positives. In the method, protein sequences are numerically represented by biochemical properties of amino acids, which reflect the structural and functional differences of proteins. Fourier transform is applied to the numerical representation of protein sequences to capture the dissimilarities of protein sequences in biophysical context. The method is assessed for predicting PPIs in Ebola virus. The results indicate strong coevolution between the protein pairs (NP-VP24, NP-VP30, NP-VP40, VP24-VP30, VP24-VP40, and VP30-VP40). The method is also validated for PPIs in influenza and E.coli genomes. Since our method can reduce false positive and increase the specificity of PPI prediction, it offers an effective tool to understand mechanisms of disease pathogens and find potential targets for drug design. The Python programs in this study are available to public at URL (https://github.com/cyinbox/PPI).

Highlights

  • Proteins are essential molecules in all biological systems in a cell, with most proteins requiring protein-protein interactions (PPIs) to function effectively

  • To demonstrate the effectiveness of the discrete Fourier transform (DFT) measure of protein sequences, we apply the phylogenetic analysis of Ebola virus using NP protein by the DFT method and multiple sequence alignment (MSA)

  • This result demonstrates that a phylogenetic tree from multiple sequence alignments (MSA) may not reflect the true physiochemical changes of amino acid in protein mutations, and may cause high false positive rate in coevolution analysis

Read more

Summary

Introduction

Proteins are essential molecules in all biological systems in a cell, with most proteins requiring protein-protein interactions (PPIs) to function effectively. Transport proteins interact with structural proteins and hormone peptides interact with receptors. Some proteins form structural complexes, and the interactions among different protein complexes are necessary for cell functions. Protein interactions are fundamentally characterized as stable or transient, and both types of interactions can be either strong or weak.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call