Abstract

Rapid reconstruction of genome-scale protein–protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major issue that needs to be properly addressed in computational modeling. In this study, we take advantage of the very limited experimentally verified negative data from Negatome to infer more negative data for computational modeling. We assume that the paralogs or orthologs of two non-interacting proteins also do not interact with high probability. We coin an assumption as “Neglog” this assumption is to some extent supported by paralogous/orthologous structure conservation. To reduce the risk of bias toward the negative data from Negatome, we combine Neglog with less biased random sampling according to a certain ratio to construct training data. L2-regularized logistic regression is used as the base classifier to counteract noise and train on a large dataset. Computational results show that the proposed Neglog method outperforms pure random sampling method with sound biological interpretability. In addition, we find that independent test on negative data is indispensable for bias control, which is usually neglected by existing studies. Lastly, we use the Neglog method to validate the PPIs in STRING, which are supported by gene ontology (GO) enrichment analyses.

Highlights

  • Protein–protein interaction (PPI) is one of the central research topics in experimental and computational biology

  • We can infer that the paralog pair (AKT1, PKN1) does not have their core structures varied much. These results show that the structural mismatch between PKN1 and RPS6KA1 is conserved across the Neglog (AKT1, RPS6KA1) via paralogous structure conservation between PKN1 and AKT1

  • The performance decrease is not large and still acceptable. These results show that random sampling, as a commonly-used method, is still a good solution to computational modeling for biological problems when the required experimental negative data are not available

Read more

Summary

Introduction

Protein–protein interaction (PPI) is one of the central research topics in experimental and computational biology. Recent years have witnessed the rapid accumulation of PPI data in various databases, e.g., HPRD [1], BioGrid [2], Reactome [3], KEGG [4], IntAct [5], HitPredict [6], STRING [7], DIP [8], BIND [9], etc. The PPI experimental techniques, including X-ray crystallography, yeast two-hybrid, mass spectrometry, and affinity purification, are very credible in general. These techniques exhibit a high fraction of false positive rate and low agreements with each other [12]. Much effort has been devoted to computational reconstruction of intra-species [13,14,15,16,17,18] and inter-species [19,20,21,22,23] PPI networks, there still are several major issues that need to be properly addressed

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.