Machine learning – Predicting Ames mutagenicity of small molecules

Charmaine S.M Chu,Jack D Simpson,Paul M O'Neill,Neil G Berry

doi:10.1016/j.jmgm.2021.108011

Abstract

In modern drug discovery, detection of a compound's potential mutagenicity is crucial. However, the traditional method of mutagenicity detection using the Ames test is costly and time consuming as the compounds need to be synthesised and then tested and the results are not always accurate and reproducible. Therefore, it would be advantageous to develop robust in silico models which can accurately predict the mutagenicity of a compound prior to synthesis to overcome the inadequacies of the Ames test. After curation of a previously defined compound mutagenicity library, over 5000 molecules had their chemical fingerprints and molecular properties calculated. Using 8 classification modelling algorithms, including support vector machine (SVM), random forest (RF) and extreme gradient boosting (XGB), a total of 112 predictive models have been constructed. Their performance has been assessed using 10-fold cross validation and a hold-out test set and some of the top performing models have been assessed using the y-randomisation approach. As a result, we have found SVM and XGB models to have good performance during the 10-fold cross validation (AUROC >0.90, sensitivity >0.85, specificity >0.75, balanced accuracy >0.80, Kappa >0.65) and on the test set (AUROC >0.65, sensitivity >0.65, specificity >0.60, balanced accuracy >0.65, Kappa >0.30). We have also identified molecular properties that are the most influential for mutagenicity prediction when combined with chemical molecular fingerprints. Using the Class A mutagenic compounds from the Ames/QSAR International Challenge Project, we were able to verify our models perform better, predicting more mutagens correctly then the StarDrop Ames mutagenicity prediction and TEST mutagenicity prediction.

Full Text