Applying Machine Learning Techniques for Classifying Cyclin-Dependent Kinase Inhibitors

Ibrahim Z Abdelbaky,Ahmed F,Amr A

doi:10.14569/ijacsa.2018.091132

Abstract

The importance of protein kinases made them a target for many drug design studies. They play an essential role in cell cycle development and many other biological processes. Kinases are divided into different subfamilies according to the type and mode of their enzymatic activity. Computational studies targeting kinase inhibitors identification is widely considered for modelling kinase-inhibitor. This modelling is expected to help in solving the selectivity problem arising from the high similarity between kinases and their binding profiles. In this study, we explore the ability of two machine-learning techniques in classifying compounds as inhibitors or non-inhibitors for two members of the cyclin-dependent kinases as a subfamily of protein kinases. Random forest and genetic programming were used to classify CDK5 and CDK2 kinases inhibitors. This classification is based on calculated values of chemical descriptors. In addition, the response of the classifiers to adding prior information about compounds promiscuity was investigated. The results from each classifier for the datasets were analyzed by calculating different accuracy measures and metrics. Confusion matrices, accuracy, ROC curves, AUC values, F1 scores, and Matthews correlation, were obtained for the outputs. The analysis of these accuracy measures showed a better performance for the RF classifier in most of the cases. In addition, the results show that promiscuity information improves the classification accuracy, but its significant effect was notably clear with GP classifiers.

Highlights

Different important biological processes in the human body is related to the process of phosphorylation
We extracted the values for the first 1497 compounds against two protein kinases belonging to the cyclin-dependent kinases subfamily, CDK2, and CDK5
Www.ijacsa.thesai.org classifier more than its improvement for Random Forest (RF) classifier on the training set level. This improvement is clearly noticeable in Genetic Programming (GP) results for the test sets, GP accuracy is still low on test sets compared to RF

Summary

INTRODUCTION

Different important biological processes in the human body is related to the process of phosphorylation. Computer-based approaches is being utilized in order to help profile the activity of different inhibitors against kinases and to explore and tackle the selectivity problem Among these techniques is machine learning, which is widely utilized in biological and medical related problems. Genetic Programming (GP) [14] is a machine learning technique that simulates biological evolution and is used for modelling by regression or classification It starts by a random population, it continues to produce generations and individuals by performing evolutionary operations such as mutations, crossover, and selection, aiming to improve a fitness function. We use genetic programming and random forest classification techniques for classifying inhibitors and non-inhibitors for two of the cyclin-dependent kinases, CDK5 and CDK2. Both techniques were used for modelling chemical descriptors information.

Data Sources

Data Preparation

Genetic Programming Classification

Methodology

Random Forest Classification

RESULTS AND DISCUSSION

Accuracy

Confusion Matrix

ROC Curves

F1 Score

Matthews Correlation Coefficient

Score Training

Important Vairables

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Applying Machine Learning Techniques for Classifying Cyclin-Dependent Kinase Inhibitors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2018
License type: cc-by

Similar Papers

The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation
Davide Chicco ... Giuseppe Jurman
BioData Mining | VOL. 14
Davide Chicco, et. al.Davide Chicco ... Giuseppe Jurman
04 Feb 2021
BioData Mining | VOL. 14

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
Davide Chicco ... Giuseppe Jurman
BMC Genomics | VOL. 21
Davide Chicco, et. al.Davide Chicco ... Giuseppe Jurman
02 Jan 2020
BMC Genomics | VOL. 21

Evolving A Neural Network to Predict Diabetic Neuropathy
Shiva Reddy ... Gadiraju Mahesh
ICST Transactions on Scalable Information Systems | VOL. -
Shiva Reddy, et. al.Shiva Reddy ... Gadiraju Mahesh
13 Jul 2018
ICST Transactions on Scalable Information Systems | VOL. -

Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study.
Nicolas Munsch ... Alistair Martin
Journal of Medical Internet Research | VOL. 22
Nicolas Munsch, et. al.Nicolas Munsch ... Alistair Martin
06 Oct 2020
Journal of Medical Internet Research | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Applying Machine Learning Techniques for Classifying Cyclin-Dependent Kinase Inhibitors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications