Abstract

ProtPhylo is a web-based tool to identify proteins that are functionally linked to either a phenotype or a protein of interest based on co-evolution. ProtPhylo infers functional associations by comparing protein phylogenetic profiles (co-occurrence patterns of orthology relationships) for more than 9.7 million non-redundant protein sequences from all three domains of life. Users can query any of 2048 fully sequenced organisms, including 1678 bacteria, 255 eukaryotes and 115 archaea. In addition, they can tailor ProtPhylo to a particular kind of biological question by choosing among four main orthology inference methods based either on pair-wise sequence comparisons (One-way Best Hits and Best Reciprocal Hits) or clustering of orthologous proteins across multiple species (OrthoMCL and eggNOG). Next, ProtPhylo ranks phylogenetic neighbors of query proteins or phenotypic properties using the Hamming distance as a measure of similarity between pairs of phylogenetic profiles. Candidate hits can be easily and flexibly prioritized by complementary clues on subcellular localization, known protein–protein interactions, membrane spanning regions and protein domains. The resulting protein list can be quickly exported into a csv text file for further analyses. ProtPhylo is freely available at http://www.protphylo.org.

Highlights

  • Advances in sequencing technologies and genome annotation tools continuously increase the repertoire of proteincoding genes in numerous organisms

  • The output list can be further prioritized directly in ProtPhylo based on five complementary filtering criteria: cut-off Hamming distance (HD) values and percentile; combined (And) or stand-alone (Or) evidence of subcellular localization; presence (>0) or absence (=0) of transmembrane helices; presence of conserved Pfam domains; keywords; confidence score for functional associations predicted by STRING [21], (STRING score)

  • To emphasize its suitability as a discovery tool, we validated its performance with datasets of known human protein complexes (CORUM, [44]), cellular components from the Gene Ontology (GO) database [45] and metabolic and signaling pathways from KEGG [46]

Read more

Summary

Introduction

Advances in sequencing technologies and genome annotation tools continuously increase the repertoire of proteincoding genes in numerous organisms. One such method is based on phylogenetic profiling [6], whose predictive power increases as more sequenced genomes from diverse taxonomic groups become available [7]. ProtPhylo achieves flexibility and stateof-the-art taxonomic and functional coverage by generating phylogenetic profiles for 9.7 million non-redundant protein sequences across 2048 organisms and by implementing four independent orthology detection algorithms.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call