Abstract

Quantitative structure-activity relationships and quantitative structure?property relationships have proved their usefulness for predicting toxicities of drug molecules regarding their biological activities. In silico toxicity prediction techniques are essential for reducing testing on rodents (in vivo) and for a less time-consuming and more cost-efficient alternative for the identification of toxic effects at an early stage of drug development. The authors aim to build a prediction model for better assessment of toxicity to quickly and efficiently test whether certain chemical compounds have the potential to disrupt the processes in the human body that may adversely affect human health. Here, we have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors) that can bind to the aryl hydrocarbon receptor. Pharmaceutical data exploration laboratory software is used for extracting the features of drug molecules. The dataset of the aryl hydrocarbon receptor contains 9008 drug molecules, where 1063 are active and 7945 are inactive, and each drug molecule contains 1444 features. It is a novel prediction model based on ensemble learning that can efficiently classify active (binding) and inactive (nonbinding) compounds of the dataset. In our proposed ensemble model, we primarily performed feature selection using the Boruta library in R, after which we resolved the class imbalance problem itself by ensemble learning where we divided the dataset into seven data frames, which have approximately equal numbers of active and inactive drug molecules. An ensemble model based upon the votes of seven random forest models is proposed, which gives an accuracy of 93.76%. K-fold cross-validation is conducted to measure the consistency of the model. Finally, the validity of the proposed ensemble model for some drug molecules of acquired immune deficiency syndrome therapy and androgen receptor has been proved.

Highlights

  • Most drugs are small molecules that are invented to interact with, bind, and regulate the activity of specific biological receptors

  • In this paper, we have proposed an ensemble-based efficient computational method, which has solved the problem of toxicity prediction of drug molecules that activate the aryl hydrocarbon receptor signaling pathway

  • It is a decision support system to predict the toxicity of unknown drug molecules that act on aryl hydrocarbon receptor (AhR), where we can get the results of toxicity prediction by uploading the structure-data files (SDFs) of any single drug molecule

Read more

Summary

Introduction

Most drugs are small molecules that are invented to interact with, bind, and regulate the activity of specific biological receptors. Drwal et al proposed molecular similarity-based and naive Bayes classification for the prediction of the toxicity of the nuclear receptor and stress response pathway, which was screened from the Tox data challenge of 2014. It was implemented in KNIME software [12]. Capuzzi et al built QSAR models for 12 stress response and nuclear receptor signaling pathway toxicity assays as part of the 2014 Tox challenge These models were built using random forest, deep neural networks, and various combinations of descriptors, where deep neural networks performed better. The drawback of this methodology is the high demand for computational resources [14]

Materials and methods
Target class
Proposed ensemble-based prediction model
Phase 1
Phase 2
Phase 3
Phase 4
Phase 5
Gini coefficient
Specificity
Accuracy
Validation of the proposed ensemble model
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call