Toxicity Modeling and Prediction with Pattern Recognition

Svante Wold,Sven Hellberg,William J Dunn

doi:10.2307/3430076

Abstract

Empirical models can be constructed relating the change in toxicity to the change in chemical structure for series of similar compounds or mixtures. The first step is to translate the variation in structure to quantitative numbers. This gives a data table, a data matrix denoted by X, which then is analyzed. The same type of the models can be used to relate the variation of in vivo data to the variation of a battery of in vitro tests. A single data analytical model cannot be applied to a set of compounds of diverse chemical structure. For such data sets, separate models must be developed for each subgroup of compounds. The data analytical problem then partly is one of classification, pattern recognition (PARC). The assumption of structural and biological similarity within each subset of modeled compounds is then essential for empirical models to apply. PARC is often used to classify compounds as active (toxic) or inactive. The data structure is then often asymmetric which puts special demands on the data analysis, making the traditional PARC methods inapplicable. Depending on the desired information from the data analysis and on the type of available data, four levels of PARC can be distinguished: (I) the data X are used to develop rules for classifying future compounds into one of the classes represented in X; (II) same as I, but the possibility of future compounds belonging to "unknown" classes not represented in X is taken into account; (III) same as II, plus the quantitative prediction of one activity variable (here toxicity) in some classes; (IV) same as III, but several quantitative activity (toxicity) variables are predicted.

Full Text