Abstract

BackgroundProtein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes.ResultsWe proposed a novel, simple method to account for some of the intra-matrix correlations in improving the prediction accuracy. Specifically, the phylogenetic species tree of the reference genomes is used as a guide tree for hierarchical clustering of the orthologous proteins. The distances between these clusters, derived from the original pairwise distance matrix using the Neighbor Joining algorithm, form intermediate distance matrices, which are then transformed and concatenated into a super phylogenetic vector. A support vector machine is trained and tested on pairs of proteins, represented as super phylogenetic vectors, whose interactions are known. The performance, measured as ROC score in cross validation experiments, shows significant improvement of our method (ROC score 0.8446) over that of using Pearson correlations (0.6587).ConclusionWe have shown that the phylogenetic tree can be used as a guide to extract intra-matrix correlations in the distance matrices of orthologous proteins, where these correlations are represented as intermediate distance matrices of the ancestral orthologous proteins. Both the unsupervised and supervised learning paradigms benefit from the explicit inclusion of these intermediate distance matrices, and particularly so in the latter case, which offers a better balance between sensitivity and specificity in the prediction of protein-protein interactions.

Highlights

  • Protein-protein interactions are critical for cellular functions

  • Protein-protein interactions play a key role in cellular functions, and to complement the experimental approaches [1,2], many computational methods have recently been developed in systems biology for predicting whether two proteins interact, based on what is already known about these proteins

  • The implementation of the support vector machine is adopted from the SVMLight package [27]

Read more

Summary

Introduction

Protein-protein interactions are critical for cellular functions. Recently developed computational approaches for predicting protein-protein interactions utilize co-evolutionary information of the interacting partners, e.g., correlations between distance matrices, where each matrix stores the pairwise distances between a protein and its orthologs from a group of reference genomes. Protein-protein interactions play a key role in cellular functions, and to complement the experimental approaches [1,2], many computational methods have recently been developed in systems biology for predicting whether two proteins interact, based on what is already known about these proteins. The mirror tree method predicts protein-protein interactions under the assumption that the interacting proteins show similarity in the molecular phylogenetic protein trees because of the co-evolution caused by the interaction. The mirror tree method compares a pair of distance matrices by calculating the Pearson correlation coefficient for the corresponding elements in the two matrices, and uses the correlation coefficient as a measure to evaluate the extent of coevolutionary behavior between two proteins

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.