Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.

Nikolai Daraselia,Anton Yuryev,Ilya Mazo,Iaroslav Ispolatov,Sergei Egorov

doi:10.1186/1471-2105-8-243

Nikolai Daraselia, Anton Yuryev + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-8-243

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jul 10, 2007
Citations: 85	License type: cc-by

Affiliation: Ariadne Diagnostics (United States)

Abstract

BackgroundUncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets.ResultsWe developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller.ConclusionProtein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.

Highlights

Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools
Evaluation of protein-Gene Ontology (GO) association extracted by MedScan technology The extension of the MedScan natural processing technology to detect GO terms and protein-GO association is described in the Methods section and in Additional file 1
Higher-than-average number of protein interactions within GO annotations To check the hypothesis that cellular functional modularity is achieved by the increased link density in the molecular interaction network and to further study MedScan extraction accuracy, we investigated whether proteins within a GO group had an increased probability to interact with each other than with arbitrary network proteins

Summary

Introduction

Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. Numerous attempts to detect modules in biological networks have been described [2,3,4] In many of these studies, the Gene Ontology [5] (GO) has been used as the "gold standard" to validate the functional relevance of the found network clusters [6,7]. GO is a directed acyclic graph of terms (nodes) connected with links representing two types of term relations: "is-a" and "part-of." GO has three major branches covering corresponding aspects of protein functions: biological process, molecular function, and cellular components. The approaches for assigning GO terms to proteins can be grouped in two major classes

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Protein-Protein Interaction Network Alignment by Quantitative Simulation
Perry Evans ... Lyle Ungar
-
Perry Evans, et. al.Perry Evans ... Lyle Ungar
01 Jan 2008
01 Jan 2008

AgBase: a unified resource for functional analysis in agriculture
F M Mccarthy ... G B Magee
Nucleic Acids Research | VOL. 35
F M Mccarthy, et. al.F M Mccarthy ... G B Magee
29 Nov 2006
Nucleic Acids Research | VOL. 35

Truncated SVD best rank choice through ROC curves for genomic annotation prediction
D Chicco ... M Masseroli
EMBnet.journal | VOL. 18
D Chicco, et. al.D Chicco ... M Masseroli
29 Apr 2012
EMBnet.journal | VOL. 18

Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae
Shaowu Meng ... Thomas K Mitchell
BMC Microbiology | VOL. 9
Shaowu Meng, et. al.Shaowu Meng ... Thomas K Mitchell
01 Feb 2009
BMC Microbiology | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics