With the rapid increase in the number of commercial chemicals, testing methods regarding on median lethal dose (LD50) relying animal experiments face challenges such as high costs and ethical concerns. Classical quantitative structure-activity relationship models relying on single algorithm always lack interpretability and precision, given the complexity of the mechanisms underlying acute toxicity. To address these issues, this study has developed a predictive framework using an ensemble learning model based on Super-learner. Particularly, we first obtained LD50 data for 9,843 compounds and constructed 16 meta models using 4 molecular descriptors and machine learning algorithms. The Super-learner model performed well, achieving R² values of 0.61 and 0.64 in five-fold cross-validation and test sets, respectively, with corresponding root mean square errors of 0.55 and 0.64, significantly outperforming the results of individual model. Additionally, we incorporated data filtering and applicability domain methods, which demonstrated that the Super-learner can mitigate the impact of dataset noise to some extent. The model achieved an R² of 0.76 within an applicability domain, ensuring prediction accuracy within the chemical space. Compared to previous studies, the model developed here using Super-learner generally achieved better performance across a larger applicability domain. Finally, we has launched an online tool (http://sltox.hhra.net), allowing users to quickly predict LD50 of compounds, greatly simplifying the chemical safety assessment process. This study not only provides an effective and cost-efficient method for predicting chemical toxicity but also offers technical support and data for risk assessments of chemicals.
Read full abstract