Abstract

BackgroundA new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression. The ability of biologists to analyze and interpret such data relies on functional annotation of the included proteins, but even in highly characterized organisms many proteins can lack the functional evidence necessary to infer their biological relevance.ResultsHere we have applied high confidence function predictions from our automated prediction system, PFP, to three genome sequences, Escherichia coli, Saccharomyces cerevisiae, and Plasmodium falciparum (malaria). The number of annotated genes is increased by PFP to over 90% for all of the genomes. Using the large coverage of the function annotation, we introduced the functional similarity networks which represent the functional space of the proteomes. Four different functional similarity networks are constructed for each proteome, one each by considering similarity in a single Gene Ontology (GO) category, i.e. Biological Process, Cellular Component, and Molecular Function, and another one by considering overall similarity with the funSim score. The functional similarity networks are shown to have higher modularity than the protein-protein interaction network. Moreover, the funSim score network is distinct from the single GO-score networks by showing a higher clustering degree exponent value and thus has a higher tendency to be hierarchical. In addition, examining function assignments to the protein-protein interaction network and local regions of genomes has identified numerous cases where subnetworks or local regions have functionally coherent proteins. These results will help interpreting interactions of proteins and gene orders in a genome. Several examples of both analyses are highlighted.ConclusionThe analyses demonstrate that applying high confidence predictions from PFP can have a significant impact on a researchers' ability to interpret the immense biological data that are being generated today. The newly introduced functional similarity networks of the three organisms show different network properties as compared with the protein-protein interaction networks.

Highlights

  • A new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression

  • Four different functional similarity networks are generated by using the function annotation in the three Gene Ontology (GO) categories, namely, Biological Process (BP), Cellular Component (CC), Molecular Function (MF), and by using the funSim score, which evaluates the overall functional similarity among the three GO categories. funSim uses the hierarchical structure of GO and information content of common ancestors of predicted and actual terms [14,38]

  • Enrichment of function annotation by PFP We have previously shown that PFP can make more accurate function prediction than existing methods and it can significantly increase the coverage of the function assignment to a genome [13,14]

Read more

Summary

Introduction

A new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression. Homology searches [18,19], those which use protein tertiary structure information [20,21,22,23], methods that consider conservation of gene locations in genome sequences [24,25], and methods which utilize protein-protein interaction (PPI) data [26,27,28]. The hierarchy of the biological network was first observed in metabolic pathway networks [39] This might imply that the funSim score of the three organisms studied somewhat captures the structure of relationships between proteins in pathways. We present several interesting and potentially useful individual cases from each of the analysis, and provide extensive supplementary data for all of the methods discussed

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call