Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition

Mahtab Kokabi,Matthew Donnelly,Guangyu Xu

doi:10.1109/access.2020.3046190

Mahtab Kokabi, Matthew Donnelly + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.3046190

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 60	License type: CC BY 4.0

Abstract

Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size <; 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction.

Highlights

Drug development often involves extensive investment and time effort on experimental screening of drug candidates
Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs
ALGORITHMS Using the fingerprint representations of 56 compounds in the training dataset with known activity for Wnt signaling inhibition, we developed predictive QSAR models based on four machine learning algorithms: QSVM, fine tree, bagged tree, and RUSboosted tree

Summary

Introduction

Drug development often involves extensive investment and time effort on experimental screening of drug candidates. Computational methods based on threedimensional quantitative structure-activity relationship (3D QSAR) analysis, high-throughput imaging (HTI), and pharmacophore modeling [5], [6]-[10] have succeeded in predicting the effectiveness of drug compounds towards prevalent human diseases (e.g., cancer [10]). These high-performance methods often require user intervention steps on molecular/ligand alignment [5], [8], [9]. This analysis correlates the structural details of drug molecules to their effectiveness in biological assays that correspond to specific diseases and builds models that can predict the bioactivity or physiochemical properties of unknown drug compounds [1], [2], [3], [6]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

3D MI-DRAGON: New Model for the Reconstruction of US FDA Drug- Target Network and Theoretical-Experimental Studies of Inhibitors of Rasagiline Derivatives for AChE
Francisco Prado-Prado ... Olga Caamano
Current Topics in Medicinal Chemistry | VOL. 12
Francisco Prado-Prado, et. al.Francisco Prado-Prado ... Olga Caamano
13 Nov 2012
Current Topics in Medicinal Chemistry | VOL. 12

Prediction on the mutagenicity of nitroaromatic compounds using quantum chemistry descriptors based QSAR and machine learning derived classification methods
Yuxing Hao ... Yongzhen Peng
Ecotoxicology and Environmental Safety | VOL. 186
Yuxing Hao, et. al.Yuxing Hao ... Yongzhen Peng
18 Oct 2019
Ecotoxicology and Environmental Safety | VOL. 186

Development of 2D and 3D Quantitative Structure Activity Relationship Models of Thiazole Derivatives for Antimicrobial Activity
Majid Shabbir Khan ... Mohamad Taleuzzaman
International Journal of Pharmaceutical Sciences and Drug Research | VOL. 14
Majid Shabbir Khan, et. al.Majid Shabbir Khan ... Mohamad Taleuzzaman
30 Mar 2022
International Journal of Pharmaceutical Sciences and Drug Research | VOL. 14

Comparison of MLR, PLS and GA-MLR in QSAR analysis*
A.K Saxena ... P Prathipati
SAR and QSAR in Environmental Research | VOL. 14
A.K Saxena, et. al.A.K Saxena ... P Prathipati
01 Oct 2003
SAR and QSAR in Environmental Research | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Benchmarking Small-Dataset Structure-Activity-Relationship Models for Prediction of Wnt Signaling Inhibition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access