Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein-Protein Interaction Networks.

Suyu Mei,Kun Zhang

doi:10.3390/ijms20205075

Abstract

Rapid reconstruction of genome-scale protein–protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major issue that needs to be properly addressed in computational modeling. In this study, we take advantage of the very limited experimentally verified negative data from Negatome to infer more negative data for computational modeling. We assume that the paralogs or orthologs of two non-interacting proteins also do not interact with high probability. We coin an assumption as “Neglog” this assumption is to some extent supported by paralogous/orthologous structure conservation. To reduce the risk of bias toward the negative data from Negatome, we combine Neglog with less biased random sampling according to a certain ratio to construct training data. L2-regularized logistic regression is used as the base classifier to counteract noise and train on a large dataset. Computational results show that the proposed Neglog method outperforms pure random sampling method with sound biological interpretability. In addition, we find that independent test on negative data is indispensable for bias control, which is usually neglected by existing studies. Lastly, we use the Neglog method to validate the PPIs in STRING, which are supported by gene ontology (GO) enrichment analyses.

Highlights

Protein–protein interaction (PPI) is one of the central research topics in experimental and computational biology
We can infer that the paralog pair (AKT1, PKN1) does not have their core structures varied much. These results show that the structural mismatch between PKN1 and RPS6KA1 is conserved across the Neglog (AKT1, RPS6KA1) via paralogous structure conservation between PKN1 and AKT1
The performance decrease is not large and still acceptable. These results show that random sampling, as a commonly-used method, is still a good solution to computational modeling for biological problems when the required experimental negative data are not available

Summary

Introduction

Protein–protein interaction (PPI) is one of the central research topics in experimental and computational biology. Recent years have witnessed the rapid accumulation of PPI data in various databases, e.g., HPRD [1], BioGrid [2], Reactome [3], KEGG [4], IntAct [5], HitPredict [6], STRING [7], DIP [8], BIND [9], etc. The PPI experimental techniques, including X-ray crystallography, yeast two-hybrid, mass spectrometry, and affinity purification, are very credible in general. These techniques exhibit a high fraction of false positive rate and low agreements with each other [12]. Much effort has been devoted to computational reconstruction of intra-species [13,14,15,16,17,18] and inter-species [19,20,21,22,23] PPI networks, there still are several major issues that need to be properly addressed

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Molecular Sciences	Publication Date: Oct 12, 2019
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein-Protein Interaction Networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences

Lead the way for us

Similar Papers

Identification of potential biomarkers for colorectal cancer by clinical database analysis and Kaplan-Meier curves analysis.
Chongyang Li ... Mingxiao Guo
Medicine | VOL. 102
Chongyang Li, et. al.Chongyang Li ... Mingxiao Guo
10 Feb 2023
Medicine | VOL. 102

CCNB2 as a potential biomarker of bladder cancer via the high throughput technology.
Lei Zhang ... Bin Liu
Medicine | VOL. 102
Lei Zhang, et. al.Lei Zhang ... Bin Liu
10 Feb 2023
Medicine | VOL. 102

Proteome-wide Prediction of Signal Flow Direction in Protein Interaction Networks Based on Interacting Domains
Wei Liu ... Fuchu He
Molecular & Cellular Proteomics | VOL. 8
Wei Liu, et. al.Wei Liu ... Fuchu He
01 Sep 2009
Molecular & Cellular Proteomics | VOL. 8

CENPE, PRC1, TTK, and PLK4 May Play Crucial Roles in the Osteosarcoma Progression.
Fei Wang ... Qiheng Zhao
Technology in Cancer Research & Treatment | VOL. 19
Fei Wang, et. al.Fei Wang ... Qiheng Zhao
01 Jan 2020
Technology in Cancer Research & Treatment | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Neglog: Homology-Based Negative Data Sampling Method for Genome-Scale Reconstruction of Human Protein-Protein Interaction Networks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences