Abstract

Three machine learning classifiers: random forest, decision tree and support vector machine were used to build predictive models of an anti-mycobacterial ChEMBL database and evaluated for their predictive capability. Before the development of predictive models, data pre-processing was carried out to fix the class imbalance problem by applying cost-sensitive classifier, and filtration of data instance by supervised synthetic minority oversampling technique (SMOTE), spread subsample and resample method. The statistical evaluation indicated that random forest model was the best model as it showed the best accuracy 93.83%, specificity 90.5%, receiver operating characteristic (ROC) 0.984, MCC 0.772 and kappa statistics 0.768 in comparison to other models whereas LibSVM showed the highest sensitivity 94.4% compared with others. Additionally, toxicity predictive models based on SingleCellcall DSSTox carcinogenicity database (AID1189) was developed which resulted in random forest model as the best model. The deployment of both RF predictive models on two unknown datasets resulted in 1317 compounds out of 1554 approved drugs and 2234 compounds out of 18,746 ChEMBL anti-malarial dataset as non-toxic and anti-mycobacterial compounds. Thus machine learning models present highly efficient methods to find out novel hit anti-mycobacterial compounds. We suggest that such machine learning techniques could be very useful to screen drug candidates not only for tuberculosis but also for other diseases.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.