Abstract

Detection of biomarker genes and their regulatory doses of chemical compounds (DCCs) is one of the most important tasks in toxicogenomic studies as well as in drug design and development. There is an online computational platform “Toxygates” to identify biomarker genes and their regulatory DCCs by co-clustering approach. Nevertheless, the algorithm of that platform based on hierarchical clustering (HC) does not share gene-DCC two-way information simultaneously during co-clustering between genes and DCCs. Also it is sensitive to outlying observations. Thus, this platform may produce misleading results in some cases. The probabilistic hidden variable model (PHVM) is a more effective co-clustering approach that share two-way information simultaneously, but it is also sensitive to outlying observations. Therefore, in this paper we have proposed logistic probabilistic hidden variable model (LPHVM) for robust co-clustering between genes and DCCs, since gene expression data are often contaminated by outlying observations. We have investigated the performance of the proposed LPHVM co-clustering approach in a comparison with the conventional PHVM and Toxygates co-clustering approaches using simulated and real life TGP gene expression datasets, respectively. Simulation results show that the proposed method improved the performance over the conventional PHVM in presence of outliers; otherwise, it keeps equal performance. In the case of real life TGP data analysis, three DCCs (glibenclamide-low, perhexilline-low, and hexachlorobenzene-medium) for glutathione metabolism pathway dataset as well as two DCCs (acetaminophen-medium and methapyrilene-low) for PPAR signaling pathway dataset were incorrectly co-clustered by the Toxygates online platform, while only one DCC (hexachlorobenzene-low) for glutathione metabolism pathway was incorrectly co-clustered by the proposed LPHVM approach. Our findings from the real data analysis are also supported by the other findings in the literature.

Highlights

  • Toxicogenomics studies combines toxicology with several omics technologies to assess the risk of toxins and chemical agents in organism (NRC, 2007; Afshari et al, 2011)

  • We investigate the performance of our proposed method (LPHVM) by comparing it with the conventional probabilistic hidden variable model (PHVM) using simulated datasets D1 and D2 in absence and presence of outlying observations for robust co-clustering between genes and doses of chemical compounds (DCCs) to discover biomarker genes and their regulatory DCCs

  • Every time of data simulation outliers are introduced in the dataset using the data contamination methods Tukey– Huber contamination model (THCM) and independent contamination model (ICM) at the same time error rate (ER) are calculated for PHVM and logistic probabilistic hidden variable model (LPHVM) applying these methods on the datasets

Read more

Summary

Introduction

Toxicogenomics studies combines toxicology with several omics technologies (genomics, transcriptomics, proteomics, and metabolomics) to assess the risk of toxins (small molecules, peptides, or proteins) and chemical agents (drugs, gasoline, alcohol, pesticides, fuel oil, and cosmetics) in organism (NRC, 2007; Afshari et al, 2011). Through integration of these omics technologies with bioinformatics, toxicogenomics can be used to suggest the molecular mechanism of toxicity. These toxicogenomic biomarkers can be identified from the extensive gene-treatment expression dataset of target organs of individuals (Fielden et al, 2007; Uehara et al, 2008; Igarashi et al, 2015)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call