Improving Compound-Protein Interaction Prediction by Self-Training with Augmenting Negative Samples.

Takuto Koyama,Hiroaki Iwata,Ryosuke Kojima,Shigeyuki Matsumoto,Yasushi Okuno

doi:10.1021/acs.jcim.3c00269

Takuto Koyama, Hiroaki Iwata + Show 3 more

Open Access

https://doi.org/10.1021/acs.jcim.3c00269

Copy DOI

Abstract

Identifying compound-protein interactions (CPIs) is crucial for drug discovery. Since experimentally validating CPIs is often time-consuming and costly, computational approaches are expected to facilitate the process. Rapid growths of available CPI databases have accelerated the development of many machine-learning methods for CPI predictions. However, their performance, particularly their generalizability against external data, often suffers from a data imbalance attributed to the lack of experimentally validated inactive (negative) samples. In this study, we developed a self-training method for augmenting both credible and informative negative samples to improve the performance of models impaired by data imbalances. The constructed model demonstrated higher performance than those constructed with other conventional methods for solving data imbalances, and the improvement was prominent for external datasets. Moreover, examination of the prediction score thresholds for pseudo-labeling during self-training revealed that augmenting the samples with ambiguous prediction scores is beneficial for constructing a model with high generalizability. The present study provides guidelines for improving CPI predictions on real-world data, thus facilitating drug discovery.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Compound-Protein Interaction Prediction by Self-Training with Augmenting Negative Samples.

Abstract

Talk to us

Similar Papers

More From: Journal of chemical information and modeling

Lead the way for us

Journal: Journal of chemical information and modeling	Publication Date: Jul 17, 2023
License type: CC BY-NC-ND 4.0

Similar Papers

Improving compound-protein interaction prediction by building up highly credible negative samples.
Hui Liu ... Jihong Guan
Bioinformatics | VOL. 31
Hui Liu, et. al.Hui Liu ... Jihong Guan
10 Jun 2015
Bioinformatics | VOL. 31

Identification of compound-protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds.
Lei Chen ... Yu-Dong Cai
Molecular Genetics and Genomics | VOL. 291
Lei Chen, et. al.Lei Chen ... Yu-Dong Cai
16 Aug 2016
Molecular Genetics and Genomics | VOL. 291

FMGNN: A Method to Predict Compound-Protein Interaction With Pharmacophore Features and Physicochemical Properties of Amino Acids.
Chunyan Tang ... Mian Wang
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 20
Chunyan Tang, et. al.Chunyan Tang ... Mian Wang
01 Mar 2023
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 20

A review on compound-protein interaction prediction methods: Data, format, representation and model
Sangsoo Lim ... Sun Kim
Computational and Structural Biotechnology Journal | VOL. 19
Sangsoo Lim, et. al.Sangsoo Lim ... Sun Kim
01 Jan 2020
Computational and Structural Biotechnology Journal | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Compound-Protein Interaction Prediction by Self-Training with Augmenting Negative Samples.

Abstract

Talk to us

Similar Papers

More From: Journal of chemical information and modeling