Improving protein function prediction methods with integrated literature data

Aaron P Gabow,Lawrence E Hunter,Sonia M Leach,William A Baumgartner,Debra S Goldberg

doi:10.1186/1471-2105-9-198

Aaron P Gabow, Lawrence E Hunter + Show 3 more

Open Access

PDF Available

https://doi.org/10.1186/1471-2105-9-198

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundDetermining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity.ResultsWe find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial.ConclusionCo-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.

Highlights

Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale
Complementing the protein-protein interaction (PPI) data with co-occurrence data Using the most general definition of co-occurrence, whereby an interaction exists between two proteins mentioned at least twice together in the literature, co-occurrence data was a significant source of interactions for all organisms (Table 3)
Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit

Summary

Introduction

Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. The putative characterization for unknown proteins has traditionally relied on sequence homology, for example as assessed by BLAST score. This approach is inadequate for proteomic-wide function identification as it has a failure rate of 20–40% in newly sequenced genomes [1]. New methods for proteomic-scale function prediction which do not rely on sequence homology draw from highthroughput data to make inferences, including several techniques that use protein-protein interaction graphs [1,3,4,5,6,7,8]. One obvious question becomes how useful is each of these sources to a graphtheoretic function prediction algorithm

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 15, 2008
Citations: 49	License type: CC BY 2.0

R Discovery Prime

Improving protein function prediction methods with integrated literature data

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Protein functional properties prediction in sparsely-label PPI networks through regularized non-negative matrix factorization.
Qingyao Wu ... Yueping Li
BMC Systems Biology | VOL. Suppl 9 1
Qingyao Wu, et. al.Qingyao Wu ... Yueping Li
21 Jan 2015
BMC Systems Biology | VOL. Suppl 9 1

Protein complex forming ability is favored over the features of interacting partners in determining the evolutionary rates of proteins in the yeast protein-protein interaction networks
Sandip Chakraborty ... Tapash C Ghosh
BMC Systems Biology | VOL. 4
Sandip Chakraborty, et. al.Sandip Chakraborty ... Tapash C Ghosh
12 Nov 2010
BMC Systems Biology | VOL. 4

Virtual identification of essential proteins within the protein interaction network of yeast
Ernesto Estrada
PROTEOMICS | VOL. 6
Ernesto EstradaErnesto Estrada
01 Jan 2006
PROTEOMICS | VOL. 6

The topological features of nonessential-nonhub proteins in the protein-protein interaction network
Dong Yun-Yuan ... Wang Zheng-Hua
-
Dong Yun-Yuan, et. al.Dong Yun-Yuan ... Wang Zheng-Hua
01 Oct 2012
01 Oct 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Improving protein function prediction methods with integrated literature data

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics