Identification of Pharmacophoric Fragments of DYRK1A Inhibitors Using Machine Learning Classification Models.

Mengzhou Bi,Tengjiao Fan,Jianhua Wang,Rugang Zhong,Guohui Sun,Na Zhang,Lijiao Zhao,Zhen Guan

doi:10.3390/molecules27061753

Abstract

Dual-specific tyrosine phosphorylation regulated kinase 1 (DYRK1A) has been regarded as a potential therapeutic target of neurodegenerative diseases, and considerable progress has been made in the discovery of DYRK1A inhibitors. Identification of pharmacophoric fragments provides valuable information for structure- and fragment-based design of potent and selective DYRK1A inhibitors. In this study, seven machine learning methods along with five molecular fingerprints were employed to develop qualitative classification models of DYRK1A inhibitors, which were evaluated by cross-validation, test set, and external validation set with four performance indicators of predictive classification accuracy (CA), the area under receiver operating characteristic (AUC), Matthews correlation coefficient (MCC), and balanced accuracy (BA). The PubChem fingerprint-support vector machine model (CA = 0.909, AUC = 0.933, MCC = 0.717, BA = 0.855) and PubChem fingerprint along with the artificial neural model (CA = 0.862, AUC = 0.911, MCC = 0.705, BA = 0.870) were considered as the optimal modes for training set and test set, respectively. A hybrid data balancing method SMOTETL, a combination of synthetic minority over-sampling technique (SMOTE) and Tomek link (TL) algorithms, was applied to explore the impact of balanced learning on the performance of models. Based on the frequency analysis and information gain, pharmacophoric fragments related to DYRK1A inhibition were also identified. All the results will provide theoretical supports and clues for the screening and design of novel DYRK1A inhibitors.

Highlights

Protein kinases are implicated in cellular functions by transferring a chemical addition of phosphate group to proteins [1]
In order to obtain a comprehensive evaluation of models, the five-fold cross validation method, a test set and an external test set were employed to evaluate the developed classification models based on statistical parameters including True positive (TP), true negative (TN), false positive (FP), false negative (FN), SE, SP, classification accuracy (CA), and balanced accuracy [36]
Based on the performances of models evaluated by 5-fold cross validation and the test set, the PubChem fingerprint was involved in the best model for the training set and test set with an accuracy of 0.933 and 0.911 when combined with the support vector machine (SVM) and artificial neural network (ANN) algorithm, respectively

Summary

Introduction

Protein kinases are implicated in cellular functions by transferring a chemical addition of phosphate group to proteins [1]. Developing a QSAR model involving DYRK1A inhibitors with diverse chemical scaffolds could provide general and comprehensive molecular information or privileged substructures that are determinative factors to their inhibitory activity. Without the limitation of data samples in one certain chemical scaffold, classification studies of machine learning methods along with molecular features [19,20] are applicable for DYRK1A inhibitors with diverse heterocyclic scaffolds and broad-spectrum bioactivities. Most plots were distributed in the green area (around 0.4), which indicated that the dataset presented high diversity, and the models trained based on such data can have strong generalization ability. Chemical spaces of the whole dataset were investigated based on PCA analysis of featured molecular descriptors, four descriptors of Lipinski rules, and the number of rotatable bonds.

Results

Performance topSet

Predicted Results of External Validation Set

Improved Performance of Balanced Models

Identification and Analysis of Feature Substructures

Data Collection and

Molecular Fingerprints and Machine Learning Methods

Model Performance Evaluation

Identification of Privileged Substructures

Molecular Docking

Conclusions

Methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Identification of Pharmacophoric Fragments of DYRK1A Inhibitors Using Machine Learning Classification Models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecules (Basel, Switzerland)

Lead the way for us

Journal: Molecules (Basel, Switzerland)	Publication Date: Mar 8, 2022
License type: CC BY 4.0

Similar Papers

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.
Xiao Chen ... Junpeng Pan
Aging Clinical and Experimental Research | VOL. 35
Xiao Chen, et. al.Xiao Chen ... Junpeng Pan
21 Sep 2023
Aging Clinical and Experimental Research | VOL. 35

Mind your prevalence!
Sébastien J. J. Guesné ... Shaylyn Scott
Journal of Cheminformatics | VOL. 16
Sébastien J. J. Guesné, et. al.Sébastien J. J. Guesné ... Shaylyn Scott
15 Apr 2024
Journal of Cheminformatics | VOL. 16

IMPLEMENTATION OF DATA LEVEL APPROACH TECHNIQUES TO SOLVE UNBALANCED DATA CASE ON SOFTWARE DEFECT CLASSIFICATION
Hanif Rahardian ... Rudy Herteno
Journal of Data Science and Software Engineering | VOL. 1
Hanif Rahardian, et. al.Hanif Rahardian ... Rudy Herteno
29 Jun 2020
Journal of Data Science and Software Engineering | VOL. 1

Binding Activity Classification of Anti-SARS-CoV-2 Molecules using Deep Learning Across Multiple Assays
Bilge Eren Yamasan ... Selçuk Korkmaz
Balkan medical journal | VOL. 41
Bilge Eren Yamasan, et. al.Bilge Eren Yamasan ... Selçuk Korkmaz
03 May 2024
Balkan medical journal | VOL. 41

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identification of Pharmacophoric Fragments of DYRK1A Inhibitors Using Machine Learning Classification Models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecules (Basel, Switzerland)