Abstract

Previous modelling of the median lethal dose (oral rat LD50) has indicated that local class-based models yield better correlations than global models. We evaluated the hypothesis that dividing the dataset by pesticidal mechanisms would improve prediction accuracy. A linear discriminant analysis (LDA) based-approach was utilized to assign indicators such as the pesticide target species, mode of action, or target species - mode of action combination. LDA models were able to predict these indicators with about 87% accuracy. Toxicity is predicted utilizing the QSAR model fit to chemicals with that indicator. Toxicity was also predicted using a global hierarchical clustering (HC) approach which divides data set into clusters based on molecular similarity. At a comparable prediction coverage (~94%), the global HC method yielded slightly higher prediction accuracy (r2 = 0.50) than the LDA method (r2 ~ 0.47). A single model fit to the entire training set yielded the poorest results (r2 = 0.38), indicating that there is an advantage to clustering the dataset to predict acute toxicity. Finally, this study shows that whilst dividing the training set into subsets (i.e. clusters) improves prediction accuracy, it may not matter which method (expert based or purely machine learning) is used to divide the dataset into subsets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call