Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data

Naoki Nariai,Eric D Kolaczyk,Simon Kasif

doi:10.1371/journal.pone.0000337

Naoki Nariai, Eric D Kolaczyk + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0000337

Copy DOI

Journal: PLoS ONE	Publication Date: Mar 28, 2007
Citations: 130	License type: CC BY 4.0

Affiliation: Boston University

Abstract

Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.

Highlights

Functional annotation of genes is a fundamental problem in computational and experimental biology
Pair-wise information between proteins, such as protein-protein interaction (PPI) data or co-expression information is converted into a functional linkage graph, in which an edge between nodes represents evidence for protein function similarity
Category information, such as protein motif information, mutant phenotype data, and protein localization data is combined with the functional linkage graphs using a unified probabilistic framework

Summary

Introduction

Functional annotation of genes is a fundamental problem in computational and experimental biology. Using PPI data to assign protein function has been extensively studied These algorithms are often based on the ‘‘guilt by association’’ principle that suggests that interacting neighbors in protein-protein interaction (PPI) networks might share a function [9,10,11]. Since such genomewide data sets are inherently noisy, and each type of data captures only one aspect of cellular activity (e.g. gene expression data measure mRNA levels of transcriptionally induced genes, and PPI data suggest a feasible physical interaction between proteins), it is appealing to combine such heterogeneous data in an effort to improve the coverage and accuracy of protein function prediction

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

CombFunc: predicting protein function using heterogeneous data sources
Mark N Wass ... Michael J E Sternberg
Nucleic Acids Research | VOL. 40
Mark N Wass, et. al.Mark N Wass ... Michael J E Sternberg
25 May 2012
Nucleic Acids Research | VOL. 40

PRINCESS, a Protein Interaction Confidence Evaluation System with Multiple Data Sources
Dong Li ... Fuchu He
Molecular & Cellular Proteomics | VOL. 7
Dong Li, et. al.Dong Li ... Fuchu He
01 Jun 2008
Molecular & Cellular Proteomics | VOL. 7

Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data
Yiannis A I Kourmpetis ... Aalt D J Van Dijk
PLoS ONE | VOL. 5
Yiannis A I Kourmpetis, et. al.Yiannis A I Kourmpetis ... Aalt D J Van Dijk
24 Feb 2010
PLoS ONE | VOL. 5

New avenues in protein function prediction
Iddo Friedberg ... Martin Jambon
Protein Science | VOL. 15
Iddo Friedberg, et. al.Iddo Friedberg ... Martin Jambon
01 Jun 2006
Protein Science | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE