Abstract

Background Determination of acute toxicity, expressed as median lethal dose (LD50), is one of the most important steps in drug discovery pipeline. Because in vivo assays for oral acute toxicity in mammals are time-consuming and costly, there is thus an urgent need to develop in silico prediction models of oral acute toxicity. ResultsIn this study, based on a comprehensive data set containing 7314 diverse chemicals with rat oral LD50 values, relevance vector machine (RVM) technique was employed to build the regression models for the prediction of oral acute toxicity in rate, which were compared with those built using other six machine learning approaches, including k-nearest-neighbor regression, random forest (RF), support vector machine, local approximate Gaussian process, multilayer perceptron ensemble, and eXtreme gradient boosting. A subset of the original molecular descriptors and structural fingerprints (PubChem or SubFP) was chosen by the Chi squared statistics. The prediction capabilities of individual QSAR models, measured by qext2 for the test set containing 2376 molecules, ranged from 0.572 to 0.659.ConclusionConsidering the overall prediction accuracy for the test set, RVM with Laplacian kernel and RF were recommended to build in silico models with better predictivity for rat oral acute toxicity. By combining the predictions from individual models, four consensus models were developed, yielding better prediction capabilities for the test set (qext2 = 0.669–0.689). Finally, some essential descriptors and substructures relevant to oral acute toxicity were identified and analyzed, and they may be served as property or substructure alerts to avoid toxicity. We believe that the best consensus model with high prediction accuracy can be used as a reliable virtual screening tool to filter out compounds with high rat oral acute toxicity. Graphical abstractWorkflow of combinatorial QSAR modelling to predict rat oral acute toxicity

Highlights

  • Determination of acute toxicity, expressed as median lethal dose (LD50), is one of the most important steps in drug discovery pipeline

  • Economic Co-operation and Development (OECD), the U.S Food and Drug Administration (FDA), the National Institutes of Health (NIH), the European Agency for the Evaluation of Medicinal Products (EMEA), etc., the use of alternative in vitro or in silico toxicity assessment methods that avoid the use of animals are strongly recommended [1,2,3,4]

  • Most quantitative structure–activity relationship (QSAR) models were built from small data sets of congeneric compounds [8,9,10] and had limited application domains

Read more

Summary

Introduction

Determination of acute toxicity, expressed as median lethal dose (LD50), is one of the most important steps in drug discovery pipeline. Another study reported by Raevsky [13] and coworkers proposed a so-called Arithmetic Mean Toxicity (AMT) modelling approach, which produced local models based on a k-nearest neighbors approach. This approach gave correlation coefficients (r2) from 0.456 to 0.783 for 10,241 tested compounds, but the prediction accuracy for a molecule depended on the number and structural similarity of its neighbors with experimental data in the training set [13]. Lu et al [14] employed local lazy learning (LLL) method to develop LD50 prediction models, and the rat acute toxicity of a molecule could be predicted by the experimental data of its k nearest neighbors. Similar to Raevsky’s approach [13], Lu’s approach relied on the priori knowledge of the experimental data of a query’s neighbors, and the actual prediction capability of this method was associated with the chemical diversity and structural coverage of the training set [15]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call