Abstract

It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes.

Highlights

  • Proteins play a fundamental role in the catalysis and regulation of cellular processes

  • In PICKLE 2.0, we define proteinprotein interactions as links between polymorphic entities, instead of describing them as polymorphic relationships between classes of entities. This means that rather than accounting for all possible pairings of interactor identifier types encountered in the various source databases, interactions are defined as pairings of abstract entities at different levels of genetic reference, i.e. genes, nucleotide sequences, or proteins (UniProt entries), described by various identifier types

  • These entities are interconnected through the genetic information flow and their relationships form the PICKLE genetic information ontology network (Fig 1). Having formed the latter, PICKLE replaces the typical means of primary protein-protein interactions (PPIs) dataset integration through normalization with ontological integration

Read more

Summary

Introduction

Proteins play a fundamental role in the catalysis and regulation of cellular processes. PICKLE 2.0: A human PPI meta-database employing data integration via genetic information ontology sequence (mRNA)” sub-classes are formed using the UniProt provided cross-references.

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.