Protein-protein Interaction Datasets Research Articles

BackgroundM. tuberculosis is a formidable bacterial pathogen. There is thus an increasing demand on understanding the function and relationship of proteins in various strains of M. tuberculosis. Protein-protein interactions (PPIs) data are crucial for this kind of knowledge. However, the quality of the main available M. tuberculosis PPI datasets is unclear. This hampers the effectiveness of research works that rely on these PPI datasets. Here, we analyze the two main available M. tuberculosis H37Rv PPI datasets. The first dataset is the high-throughput B2H PPI dataset from Wang et al’s recent paper in Journal of Proteome Research. The second dataset is from STRING database, version 8.3, comprising entirely of H37Rv PPIs predicted using various methods. We find that these two datasets have a surprisingly low level of agreement. We postulate the following causes for this low level of agreement: (i) the H37Rv B2H PPI dataset is of low quality; (ii) the H37Rv STRING PPI dataset is of low quality; and/or (iii) the H37Rv STRING PPIs are predictions of other forms of functional associations rather than direct physical interactions.ResultsTo test the quality of these two datasets, we evaluate them based on correlated gene expression profiles, coherent informative GO term annotations, and conservation in other organisms. We observe a significantly greater portion of PPIs in the H37Rv STRING PPI dataset (with score ≥ 770) having correlated gene expression profiles and coherent informative GO term annotations in both interaction partners than that in the H37Rv B2H PPI dataset. Predicted H37Rv interologs derived from non-M. tuberculosis experimental PPIs are much more similar to the H37Rv STRING functional associations dataset (with score ≥ 770) than the H37Rv B2H PPI dataset. H37Rv predicted physical interologs from IntAct also show extremely low similarity with the H37Rv B2H PPI dataset; and this similarity level is much lower than that between the S. aureus MRSA252 predicted physical interologs from IntAct and S. aureus MRSA252 pull-down PPIs. Comparative analysis with several representative two-hybrid PPI datasets in other species further confirms that the H37Rv B2H PPI dataset is of low quality. Next, to test the possibility that the H37Rv STRING PPIs are not purely direct physical interactions, we compare M. tuberculosis H37Rv protein pairs that catalyze adjacent steps in enzymatic reactions to B2H PPIs and predicted PPIs in STRING, which shows it has much lower similarities with the B2H PPIs than with STRING PPIs. This result strongly suggests that the H37Rv STRING PPIs more likely correspond to indirect relationships between protein pairs than to B2H PPIs. For more precise support, we turn to S. cerevisiae for its comprehensively studied interactome. We compare S. cerevisiae predicted PPIs in STRING to three independent protein relationship datasets which respectively comprise PPIs reported in Y2H assays, protein pairs reported to be in the same protein complexes, and protein pairs that catalyze successive reaction steps in enzymatic reactions. Our analysis reveals that S. cerevisiae predicted STRING PPIs have much higher similarity to the latter two types of protein pairs than to two-hybrid PPIs. As H37Rv STRING PPIs are predicted using similar methods as S. cerevisiae predicted STRING PPIs, this suggests that these H37Rv STRING PPIs are more likely to correspond to the latter two types of protein pairs rather than to two-hybrid PPIs as well.ConclusionsThe H37Rv B2H PPI dataset has low quality. It should not be used as the gold standard to assess the quality of other (possibly predicted) H37Rv PPI datasets. The H37Rv STRING PPI dataset also has low quality; nevertheless, a subset consisting of STRING PPIs with score ≥770 has satisfactory quality. However, these STRING “PPIs” should be interpreted as functional associations, which include a substantial portion of indirect protein interactions, rather than direct physical interactions. These two factors cause the strikingly low similarity between these two main H37Rv PPI datasets. The results and conclusions from this comparative analysis provide valuable guidance in using these M. tuberculosis H37Rv PPI datasets in subsequent studies for a wide range of purposes.

BackgroundUnderstanding cellular systems requires the knowledge of a protein's subcellular localization (SCL). Although experimental and predicted data for protein SCL are archived in various databases, SCL prediction remains a non-trivial problem in genome annotation. Current SCL prediction tools use amino-acid sequence features and text mining approaches. A comprehensive analysis of protein SCL in human PPI and metabolic networks for various subcellular compartments is necessary for developing a robust SCL prediction methodology.ResultsBased on protein-protein interaction (PPI) and metabolite-linked protein interaction (MLPI) networks of proteins, we have compared, contrasted and analysed the statistical properties across different subcellular compartments. We integrated PPI and metabolic datasets with SCL information of human proteins from LOCATE and GOA (Gene Ontology Annotation) and estimated three statistical properties: Chi-square (χ2) test, Paired Localisation Correlation Profile (PLCP) and network topological measures. For the PPI network, Pearson's chi-square test shows that for the same SCL category, twice as many interacting protein pairs are observed than estimated when compared to non-interacting protein pairs (χ2 = 1270.19, P-value < 2.2 × 10-16), whereas for MLPI, metabolite-linked protein pairs having the same SCL are observed 20% more than expected, compared to non-metabolite linked proteins (χ2 = 110.02, P-value < 2.2 x10-16). To address the issue of proteins with multiple SCLs, we have specifically used the PLCP (Pair Localization Correlation Profile) measure. PLCP analysis revealed that protein interactions are majorly restricted to the same SCL, though significant cross-compartment interactions are seen for nuclear proteins. Metabolite-linked protein pairs are restricted to specific compartments such as the mitochondrion (P-value < 6.0e-07), the lysosome (P-value < 4.7e-05) and the Golgi apparatus (P-value < 1.0e-15). These findings indicate that the metabolic network adds value to the information in the PPI network for the localisation process of proteins in human subcellular compartments.ConclusionsThe MLPI network differs significantly from the PPI network in its SCL distribution. The PPI network shows passive protein interaction, possibly due to its high false positive rate, across different subcellular compartments, which seem to be absent in the MLPI network, as the MLPI network has evolved to maintain high substrate specificity for proteins.

Protein-protein Interaction Datasets Research Articles

Related Topics

Articles published on Protein-protein Interaction Datasets

Abstract 4929: Dissection of protein-protein interaction-mediated cross-talk pathways in hepatocellular carcinoma

QiSampler: evaluation of scoring schemes for high-throughput datasets using a repetitive sampling strategy on gold standards

Triathlon for energy functions: Who is the winner for design of protein–protein interactions?

Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets.

Conditional random field approach to prediction of protein-protein interactions using domain information

An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology

Network analysis of human protein location

ProHits: integrated software for mass spectrometry–based interaction proteomics

Semantic and layered protein function prediction from PPI networks

The systematic annotation of the three main GPCR families in Reactome

Incorporating multiple genomic features with the utilization of interacting domain patterns to improve the prediction of protein–protein interactions

Human Protein Structural Interaction Network: Domain Effects on Network Topology and Protein Function*

Utilizing shared interacting domain patterns and Gene Ontology information to improve protein–protein interaction prediction

Exploratory analysis of protein translation regulatory networks using hierarchical random graphs

A Comprehensive Resource of Interacting Protein Regions for Refining Human Transcription Factor Networks

Prediction of protein functions based on function–function correlation relations

Analysis of protein complexes through model‐based biclustering of label‐free quantitative AP‐MS data

A discriminative approach for identifying domain–domain interactions from protein–protein interactions

Inferring protein function by domain context similarities in protein-protein interaction networks

Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Protein-protein Interaction Datasets Research Articles

Related Topics

Articles published on Protein-protein Interaction Datasets

Abstract 4929: Dissection of protein-protein interaction-mediated cross-talk pathways in hepatocellular carcinoma

QiSampler: evaluation of scoring schemes for high-throughput datasets using a repetitive sampling strategy on gold standards

Triathlon for energy functions: Who is the winner for design of protein–protein interactions?

Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets.

Conditional random field approach to prediction of protein-protein interactions using domain information

An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology

Network analysis of human protein location

ProHits: integrated software for mass spectrometry–based interaction proteomics

Semantic and layered protein function prediction from PPI networks

The systematic annotation of the three main GPCR families in Reactome

Incorporating multiple genomic features with the utilization of interacting domain patterns to improve the prediction of protein–protein interactions

Human Protein Structural Interaction Network: Domain Effects on Network Topology and Protein Function*

Utilizing shared interacting domain patterns and Gene Ontology information to improve protein–protein interaction prediction

Exploratory analysis of protein translation regulatory networks using hierarchical random graphs

A Comprehensive Resource of Interacting Protein Regions for Refining Human Transcription Factor Networks

Prediction of protein functions based on function–function correlation relations

Analysis of protein complexes through model‐based biclustering of label‐free quantitative AP‐MS data

A discriminative approach for identifying domain–domain interactions from protein–protein interactions

Inferring protein function by domain context similarities in protein-protein interaction networks

Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods