A comparison of machine learning approaches for predicting hepatotoxicity potential using chemical structure and targeted transcriptomic data

Tia Tate,Grace Patlewicz,Imran Shah

doi:10.1016/j.comtox.2024.100301

Abstract

Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity. However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for in vivo testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance in vivo toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naïve Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted k-Nearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With under-sampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for under-sampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simple classifiers first.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A comparison of machine learning approaches for predicting hepatotoxicity potential using chemical structure and targeted transcriptomic data

Abstract

Talk to us

Similar Papers

More From: Computational toxicology (Amsterdam, Netherlands)

Lead the way for us

Journal: Computational toxicology (Amsterdam, Netherlands)	Publication Date: Feb 9, 2024
Citations: 1

Similar Papers

Nuclear Magnetic Resonance Chemical Shift As Highly Explainable Chemical Structure Fingerprints for Anion Exchange Membrane Polymers
Yin Kan Phua ... Koichiro Kato
Electrochemical Society Meeting Abstracts | VOL. MA2023-02
Yin Kan Phua, et. al.Yin Kan Phua ... Koichiro Kato
22 Dec 2023
Electrochemical Society Meeting Abstracts | VOL. MA2023-02

Development of Deep Learning approaches to predict relationships between chemical structures and sweetness
Joao Capela ... Vitor Pereira
-
Joao Capela, et. al.Joao Capela ... Vitor Pereira
18 Jul 2022
18 Jul 2022

Exploring sodium glucose cotransporter (SGLT2) inhibitors with machine learning approach: A novel hope in anti-diabetes drug discovery
Md Moinul ... Shovanlal Gayen
Journal of Molecular Graphics and Modelling | VOL. 111
Md Moinul, et. al.Md Moinul ... Shovanlal Gayen
10 Dec 2021
Journal of Molecular Graphics and Modelling | VOL. 111

A novel multi-layer prediction approach for sweetness evaluation based on systematic machine learning modeling
Zheng-Fei Yang ... Dong-Sheng Cao
Food Chemistry | VOL. 372
Zheng-Fei Yang, et. al.Zheng-Fei Yang ... Dong-Sheng Cao
28 Sep 2021
Food Chemistry | VOL. 372

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparison of machine learning approaches for predicting hepatotoxicity potential using chemical structure and targeted transcriptomic data

Abstract

Talk to us

Similar Papers

More From: Computational toxicology (Amsterdam, Netherlands)