Abstract

BackgroundAs numerous experimental factors drive the acquisition, identification, and interpretation of protein-protein interactions (PPIs), aggregated assemblies of human PPI data invariably contain experiment-dependent noise. Ascertaining the reliability of PPIs collected from these diverse studies and scoring them to infer high-confidence networks is a non-trivial task. Moreover, a large number of PPIs share the same number of reported occurrences, making it impossible to distinguish the reliability of these PPIs and rank-order them. For example, for the data analyzed here, we found that the majority (>83%) of currently available human PPIs have been reported only once.ResultsIn this work, we proposed an unsupervised statistical approach to score a set of diverse, experimentally identified PPIs from nine primary databases to create subsets of high-confidence human PPI networks. We evaluated this ranking method by comparing it with other methods and assessing their ability to retrieve protein associations from a number of diverse and independent reference sets. These reference sets contain known biological data that are either directly or indirectly linked to interactions between proteins. We quantified the average effect of using ranked protein interaction data to retrieve this information and showed that, when compared to randomly ranked interaction data sets, the proposed method created a larger enrichment (~134%) than either ranking based on the hypergeometric test (~109%) or occurrence ranking (~46%).ConclusionsFrom our evaluations, it was clear that ranked interactions were always of value because higher-ranked PPIs had a higher likelihood of retrieving high-confidence experimental data. Reducing the noise inherent in aggregated experimental PPIs via our ranking scheme further increased the accuracy and enrichment of PPIs derived from a number of biologically relevant data sets. These results suggest that using our high-confidence protein interactions at different levels of confidence will help clarify the topological and biological properties associated with human protein networks.

Highlights

  • As numerous experimental factors drive the acquisition, identification, and interpretation of proteinprotein interactions (PPIs), aggregated assemblies of human protein-protein interactions (PPIs) data invariably contain experiment-dependent noise

  • The currently available individual PPI data sets can be roughly categorized into three sets: 1) proteome-wide, large-scale screenings aimed at investigating all possible PPIs [1,2,3], 2) semi-large-scale screenings aimed at investigating the interactions between a specific group of proteins and all other proteins [4,5], and 3)

  • We evaluated the improvement that could be gained by using the differently ranked data sets to rank-order the interactions in reference data sets that are presumed to be enriched with interacting proteins, i.e., in sets of proteins that share the same Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, are implicated in the same disease, share Gene Ontology (GO) function, or tissue mRNA expression levels

Read more

Summary

Introduction

As numerous experimental factors drive the acquisition, identification, and interpretation of proteinprotein interactions (PPIs), aggregated assemblies of human PPI data invariably contain experiment-dependent noise. For the data analyzed here, we found that the majority (>83%) of currently available human PPIs have been reported only once. Small-scale, traditional studies aimed at detecting specific PPIs among biologically interesting proteins, e.g., oncogenes and their regulators. This latter set is still numerically dominant (~80% of all PPIs belong to this set), examples of the first two types of investigations are expanding rapidly. Given this extensive resource of known human PPIs and their continuous accelerated growth, how to globally analyze and aggregate the data remain a challenge.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call