Integrating experimental and literature protein-protein interaction data for protein complex prediction.

Yijia Zhang,Hongfei Lin,Zhihao Yang,Jian Wang

doi:10.1186/1471-2164-16-s2-s4

Yijia Zhang, Hongfei Lin + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/1471-2164-16-s2-s4

Copy DOI

Export

Save

Cite

Journal: BMC genomics	Publication Date: Jan 21, 2015
Citations: 13	License type: cc-by

Affiliation: Dalian University of Technology

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundAccurate determination of protein complexes is crucial for understanding cellular organization and function. High-throughput experimental techniques have generated a large amount of protein-protein interaction (PPI) data, allowing prediction of protein complexes from PPI networks. However, the high-throughput data often includes false positives and false negatives, making accurate prediction of protein complexes difficult.MethodThe biomedical literature contains large quantities of PPI data that, along with high-throughput experimental PPI data, are valuable for protein complex prediction. In this study, we employ a natural language processing technique to extract PPI data from the biomedical literature. This data is subsequently integrated with high-throughput PPI and gene ontology data by constructing attributed PPI networks, and a novel method for predicting protein complexes from the attributed PPI networks is proposed. This method allows calculation of the relative contribution of high-throughput and biomedical literature PPI data.ResultsMany well-characterized protein complexes are accurately predicted by this method when apply to two different yeast PPI datasets. The results show that (i) biomedical literature PPI data can effectively improve the performance of protein complex prediction; (ii) our method makes good use of high-throughput and biomedical literature PPI data along with gene ontology data to achieve state-of-the-art protein complex prediction capabilities.

Highlights

Accurate determination of protein complexes is crucial for understanding cellular organization and function
Many well-characterized protein complexes are accurately predicted by this method when apply to two different yeast protein-protein interaction (PPI) datasets
The results show that (i) biomedical literature PPI data can effectively improve the performance of protein complex prediction; (ii) our method makes good use of high-throughput and biomedical literature PPI data along with gene ontology data to achieve state-of-the-art protein complex prediction capabilities

Summary

Introduction

Accurate determination of protein complexes is crucial for understanding cellular organization and function. High-throughput experimental techniques have generated a large amount of protein-protein interaction (PPI) data, allowing prediction of protein complexes from PPI networks. The high-throughput data often includes false positives and false negatives, making accurate prediction of protein complexes difficult. Protein complexes are formed from two or more associated polypeptide chains, and accurate determination of protein complexes is of great importance for understanding cellular organization and function. Even in the relatively simple model organism Saccharomyces cerevisiae, protein complexes include many subunits that assemble and function in a coherent fashion. A key task of system biology is to understand proteins and their interactions in terms of protein complexes [1]. Nepusz et al proposed the ClusterONE algorithm [11] which detected overlapping protein complexes in PPI networks

Methods

Results

Conclusion