Abstract
BackgroundRecent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions.MethodsWe analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods.ConclusionsHigher performance for predicting protein-protein interactions was achievable even with 100–150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50–100 genomes for comparable accuracy of predictions when computational resources are limited.
Highlights
In the last few years, computational methods of predicting physical and functional Protein-Protein Interaction (PPI) have gained popularity [1,2,3,4,5]
Generation of reference genome sets In order to evaluate the effect of reference genome selection on PPI predictions, the 565 reference genomes used in this study were grouped into six sets ALL, BAAC, BAS, BAC, GAMMA and BANR
Our results suggest that performance of Binary Phylogenetic Profile Method (BPPM) is profoundly dependent on the reference genome selection as compared to the similarity based Phylogenetic Profile Method (SPPM)
Summary
In the last few years, computational methods of predicting physical and functional Protein-Protein Interaction (PPI) have gained popularity [1,2,3,4,5]. These networks help in understanding the organization and the higher order functional relationships of proteins in various cellular processes [9,10,11,12] Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history which can be traced out for all possible pairs of proteins present in the query genome (genome of interest). This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. Very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.