Abstract

Random forest, support vector machine, logistic regression, neural networks and k-nearest neighbor (lazar) algorithms, were applied to a new Salmonella mutagenicity dataset with 8,290 unique chemical structures utilizing MolPrint2D and Chemistry Development Kit (CDK) descriptors. Crossvalidation accuracies of all investigated models ranged from 80 to 85% which is comparable with the interlaboratory variability of the Salmonella mutagenicity assay. Pyrrolizidine alkaloid predictions showed a clear distinction between chemical groups, where otonecines had the highest proportion of positive mutagenicity predictions and monoesters the lowest.

Highlights

  • The assessment of mutagenicity is an important part in the safety assessment of chemical structures, because mutations may lead to cancer and germ cells damage

  • The new training data can be downloaded from https://git.in-silico.ch/mutagenicitypaper/tree/mutagenicity/mutagenicity.csv

  • A new public Salmonella mutagenicity training dataset with 8,309 experimental results was created and used to train lazar and Tensorflow models with MolPrint2D and Chemistry Development Kit (CDK) descriptors

Read more

Summary

Introduction

The assessment of mutagenicity is an important part in the safety assessment of chemical structures, because mutations may lead to cancer and germ cells damage. Computer based (in silico) mutagenicity predictions can be used in the early screening of novel compounds (e.g., drug candidates), but they are gaining regulatory acceptance e.g. for the registration of industrial chemicals within REACH (European Chemical Agency, 2017) or the assessment of impurities in pharmaceuticals (ICH, 2017). Mutagenicity is the toxicological endpoint with the largest amount of public data for almost 10000 structures, whereas datasets for other endpoints contain typically only a few hundred compounds. The Ames test itself is relatively reproducible with an interlaboratory variability of 80–85% (Piegorsch and Zeiger, 1991). This makes the development of mutagenicity models interesting from a computational chemistry and machine learning point of view. The relatively large amount of public data reduces the probability of chance effects due to small sample sizes and the reliability of the underlying assay reduces the risk of overfitting experimental errors

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.