KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development.

Robert Landsiedel,Andrea Volkamer,Klaus-Juergen Schleifer,Antje Wolf,Janosch H Achenbach,Andrea Morger,Roland Buesen,Miriam Mathea

doi:10.1186/s13321-020-00422-x

Abstract

Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process.

Highlights

Before newly developed chemicals can be approved, their potential toxic effects on humans and the environmentGiven the ever growing amount of available toxicity data, computational toxicity prediction methods have great potential to reduce time, cost, and Morger et al J Cheminform (2020) 12:24 animal testing
The full spectrum of predictions provided by KnowTox will be show cased on two triazoles
Scientists from 25 research groups have contributed to consensus models for androgen receptor binding, agonism, and antagonism with a predictive accuracy of 78% for the AA evaluation set (which is in the same range as the conformal prediction (CP) accuracy (SCP) obtained for the original KnowTox-AA model, see Table 3)

Summary

Introduction

Given the ever growing amount of available toxicity data, computational toxicity prediction methods have great potential to reduce time, cost, and . Morger et al J Cheminform (2020) 12:24 animal testing. Using historical data, they can help to disclose relationships between compounds that would not have been identified manually and, reveal potential risk of compounds in early phases of development. In silico predictions can hint at potentially hazardous interactions or critical structural moieties of new molecules. In silico methods can support product optimisation and reduce long-term animal toxicity studies [3, 4]. In silico strategies for supporting risk assessment range from computational read-across approaches and search for substructural alerts to statistical methods. Quantitative structure-activity relationship (QSAR) techniques such as machine learning (ML) [5] methods require a large precompiled dataset

Objectives

Methods

Results

Conclusion