Abstract

Dual-specific tyrosine phosphorylation regulated kinase 1 (DYRK1A) has been regarded as a potential therapeutic target of neurodegenerative diseases, and considerable progress has been made in the discovery of DYRK1A inhibitors. Identification of pharmacophoric fragments provides valuable information for structure- and fragment-based design of potent and selective DYRK1A inhibitors. In this study, seven machine learning methods along with five molecular fingerprints were employed to develop qualitative classification models of DYRK1A inhibitors, which were evaluated by cross-validation, test set, and external validation set with four performance indicators of predictive classification accuracy (CA), the area under receiver operating characteristic (AUC), Matthews correlation coefficient (MCC), and balanced accuracy (BA). The PubChem fingerprint-support vector machine model (CA = 0.909, AUC = 0.933, MCC = 0.717, BA = 0.855) and PubChem fingerprint along with the artificial neural model (CA = 0.862, AUC = 0.911, MCC = 0.705, BA = 0.870) were considered as the optimal modes for training set and test set, respectively. A hybrid data balancing method SMOTETL, a combination of synthetic minority over-sampling technique (SMOTE) and Tomek link (TL) algorithms, was applied to explore the impact of balanced learning on the performance of models. Based on the frequency analysis and information gain, pharmacophoric fragments related to DYRK1A inhibition were also identified. All the results will provide theoretical supports and clues for the screening and design of novel DYRK1A inhibitors.

Highlights

  • Protein kinases are implicated in cellular functions by transferring a chemical addition of phosphate group to proteins [1]

  • In order to obtain a comprehensive evaluation of models, the five-fold cross validation method, a test set and an external test set were employed to evaluate the developed classification models based on statistical parameters including True positive (TP), true negative (TN), false positive (FP), false negative (FN), SE, SP, classification accuracy (CA), and balanced accuracy [36]

  • Based on the performances of models evaluated by 5-fold cross validation and the test set, the PubChem fingerprint was involved in the best model for the training set and test set with an accuracy of 0.933 and 0.911 when combined with the support vector machine (SVM) and artificial neural network (ANN) algorithm, respectively

Read more

Summary

Introduction

Protein kinases are implicated in cellular functions by transferring a chemical addition of phosphate group to proteins [1]. Developing a QSAR model involving DYRK1A inhibitors with diverse chemical scaffolds could provide general and comprehensive molecular information or privileged substructures that are determinative factors to their inhibitory activity. Without the limitation of data samples in one certain chemical scaffold, classification studies of machine learning methods along with molecular features [19,20] are applicable for DYRK1A inhibitors with diverse heterocyclic scaffolds and broad-spectrum bioactivities. Most plots were distributed in the green area (around 0.4), which indicated that the dataset presented high diversity, and the models trained based on such data can have strong generalization ability. Chemical spaces of the whole dataset were investigated based on PCA analysis of featured molecular descriptors, four descriptors of Lipinski rules, and the number of rotatable bonds.

Results
Performance topSet
Predicted Results of External Validation Set
Improved Performance of Balanced Models
Identification and Analysis of Feature Substructures
Data Collection and
Molecular Fingerprints and Machine Learning Methods
Model Performance Evaluation
Identification of Privileged Substructures
Molecular Docking
Conclusions
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call