Classification of hepatotoxicity of compounds based on cytotoxicity assays is improved by additional interpretable summaries of high-dimensional gene expression data

Marieke Stolte,Wiebke Albrecht,Tim Brecklinghaus,Lisa Gründler,Peng Chen,Jan G Hengstler,Franziska Kappenberg,Jörg Rahnenführer

doi:10.1016/j.comtox.2023.100288

Abstract

Established cytotoxicity assays are commonly used for assessing the hepatotoxic risk of a compound. The addition of gene expression measurements from high-dimensional RNAseq experiments offers the potential for improved classification. However, it is generally not clear how best to summarize the high-dimensional gene measurements into meaningful variables. We propose several intuitive methods for dimension reduction of gene expression measurements toward interpretable variables and explore their relevance in predicting hepatotoxicity, using a dataset with 60 compounds.Different advanced statistical learning algorithms are evaluated as classification methods and their performances are compared on the dataset. The best predictions are achieved by tree-based methods such as random forest and xgboost, and tuning the parameters of the algorithm helps to improve the classification accuracy. It is shown that the simultaneous use of data from cytotoxicity assays and from gene expression variables summarized in different ways has a synergistic effect and leads to a better prediction of hepatotoxicity than both sets of variables individually. Further, when gene expression data are summarized, different strategies for the generation of interpretable variables contribute to the overall improved prediction quality. When considering cytotoxicity assays alone, the best classification method yields a mean accuracy of 0.757, while the same classification method and an optimal choice of variables yields a mean accuracy of 0.811. The overall best value for the mean accuracy is 0.821.

Full Text