A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model.

Richard Judson,Fathi Elloumi,Zhen Li,R Woodrow Setzer,Imran Shah

doi:10.1186/1471-2105-9-241

Abstract

BackgroundBioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods.ResultsThe classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA.ConclusionWe have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.

Highlights

Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can reduce the need for animal testing
We evaluated the performance of different machine learning (ML) methods on simulated data sets generated by a biologically motivated analytic model
This study examined Support Vector Machines (SVM), K-Nearest Neighbors (KNN) (n = 3) and Linear Discriminant Analysis (LDA) as classification algorithms

Summary

Introduction

Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can reduce the need for animal testing. The U.S EPA is carrying out one such large screening and prioritization experiment, called ToxCast, whose goal is to develop predictive signatures or classifiers that can accurately predict whether a given chemical will or will not cause particular toxicities [4]. This program is investigating a variety of chemically-induced toxicity endpoints including developmental and reproductive toxicity, neurotoxicity and cancer. The initial training set being used comes from a collection of ~300 pesticide active ingredients for which complete rodent toxicology profiles have been compiled This set of chemicals will be tested in several hundred in vitro assays

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: May 19, 2008
Citations: 109	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Machine learning in pain research.
Jörn Lötsch ... Alfred Ultsch
Pain | VOL. 159
Jörn Lötsch, et. al.Jörn Lötsch ... Alfred Ultsch
24 Nov 2017
Pain | VOL. 159

Deep Learning Accurately Predicts Estrogen Receptor Status in Breast Cancer Metabolomics Data.
Fadhl M Alakwaa ... Kumardeep Chaudhary
Journal of Proteome Research | VOL. 17
Fadhl M Alakwaa, et. al.Fadhl M Alakwaa ... Kumardeep Chaudhary
27 Nov 2017
Journal of Proteome Research | VOL. 17

Sensors support machine learning
-
Food Science and Technology | VOL. 33
--
01 Dec 2019
Food Science and Technology | VOL. 33

Applications of machine learning in friction stir welding: Prediction of joint properties, real-time control and tool failure diagnosis
Ammar H Elsheikh
Engineering Applications of Artificial Intelligence | VOL. 121
Ammar H ElsheikhAmmar H Elsheikh
14 Feb 2023
Engineering Applications of Artificial Intelligence | VOL. 121

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics