Reconstruction of the experimentally supported human protein interactome: what can we learn?

Maria I Klapa,Nicholas K Moschonas,Athanasios Tsakalidis,Evangelos Theodoridis,Kalliopi Tsafou

doi:10.1186/1752-0509-7-96

Maria I Klapa, Nicholas K Moschonas + Show 3 more

Open Access

https://doi.org/10.1186/1752-0509-7-96

Copy DOI

Abstract

BackgroundUnderstanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, therefore its systematic reconstruction is required. Several meta-databases integrate source PPI datasets, but the protein node sets of their networks vary depending on the PPI data combined. Due to this inherent heterogeneity, the way in which the human PPI network expands via multiple dataset integration has not been comprehensively analyzed. We aim at assembling the human interactome in a global structured way and exploring it to gain insights of biological relevance.ResultsFirst, we defined the UniProtKB manually reviewed human “complete” proteome as the reference protein-node set and then we mined five major source PPI datasets for direct PPIs exclusively between the reference proteins. We updated the protein and publication identifiers and normalized all PPIs to the UniProt identifier level. The reconstructed interactome covers approximately 60% of the human proteome and has a scale-free structure. No apparent differentiating gene functional classification characteristics were identified for the unrepresented proteins. The source dataset integration augments the network mainly in PPIs. Polyubiquitin emerged as the highest-degree node, but the inclusion of most of its identified PPIs may be reconsidered. The high number (>300) of connections of the subsequent fifteen proteins correlates well with their essential biological role. According to the power-law network structure, the unrepresented proteins should mainly have up to four connections with equally poorly-connected interactors.ConclusionsReconstructing the human interactome based on the a priori definition of the protein nodes enabled us to identify the currently included part of the human “complete” proteome, and discuss the role of the proteins within the network topology with respect to their function. As the network expansion has to comply with the scale-free theory, we suggest that the core of the human interactome has essentially emerged. Thus, it could be employed in systems biology and biomedical research, despite the considerable number of currently unrepresented proteins. The latter are probably involved in specialized physiological conditions, justifying the scarcity of related PPI information, and their identification can assist in designing relevant functional experiments and targeted text mining algorithms.

Highlights

Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, its systematic reconstruction is required
For proper normalization of the source PPI datasets to the Universal Protein (UniProt) identifier level, it was important to consider the continuous updating of biological information, since it can lead to changes in the annotation of protein identifiers and in their associations at other molecular levels
UniProtKB and its cross-references with major resources at the nucleotide sequence and gene levels of molecular information (i.e. NCBI, Entrez Gene and EMBL databases) provided a valuable reference for the appropriate normalization of Human Protein Reference Database (HPRD) and Biological General Repository for Interaction Datasets (BioGRID) identifiers to the UniProt level, and of a small fraction of IntAct, Molecular INTeraction database (MINT) and Database of Interacting Proteins (DIP) protein entries that were not provided at the default UniProt level

Summary

Introduction

Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, its systematic reconstruction is required. It is worth mentioning that the set of protein nodes of a metadatabase network varies depending on the PPIs of the employed source datasets, and it may change upon updating or incorporation of new datasets This fact creates heterogeneity between the various PPI meta-databases and hinders the direct comparison among their networks [11]. Because of this inherent heterogeneity, there have been many studies comparing a variety of PPI datasets [10,11,12,13,14], the way in which the human protein interactome expands via the integration of multiple datasets has not been comprehensively explored; a global perspective of the biology emerging from the network structure is still eluding

Objectives

Methods

Results

Conclusion