Phylo_dCor: distance correlation as a novel metric for phylogenetic profiling

Gabriella Sferra,Marta Ponzi,Federica Fratini,Elisabetta Pizzi

doi:10.1186/s12859-017-1815-5

Gabriella Sferra, Marta Ponzi + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/s12859-017-1815-5

Copy DOI

Export

Save

Cite

Journal: BMC Bioinformatics	Publication Date: Sep 5, 2017
Citations: 5	License type: open-access

Affiliation: Istituto Superiore di Sanità

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundElaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods.ResultsHere, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson’s correlation as measures of profile similarity.ConclusionsIn this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.

Highlights

Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era
We adopted a new strategy of genome selection to obtain unbiased and large reference sets of genomes
A second reference set (RS2) was generated from Reference set 1 (RS1) excluding the eukaryotic genomes with a “peripheral” attribute till having 45 eukaryotic genomes in a such way to pass from a ratio 5:1 to a ratio 13:1

Summary

Results

We propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson’s correlation as measures of profile similarity

Conclusions

Background

Results and discussion