Abstract

A major challenge in drug development is safety and toxicity concerns due to drug side effects. One such side effect, drug-induced liver injury (DILI), is considered a primary factor in regulatory clearance. The Critical Assessment of Massive Data Analysis (CAMDA) 2020 CMap Drug Safety Challenge goal was to develop prediction models based on gene perturbation of six preselected cell-lines (CMap L1000), extended structural information (MOLD2), toxicity data (TOX21), and FDA reporting of adverse events (FAERS). Four types of DILI classes were targeted, including two clinically relevant scores and two control classifications, designed by the CAMDA organizers. The L1000 gene expression data had variable drug coverage across cell lines with only 247 out of 617 drugs in the study measured in all six cell types. We addressed this coverage issue by using Kru-Bor ranked merging to generate a singular drug expression signature across all six cell lines. These merged signatures were then narrowed down to the top and bottom 100, 250, 500, or 1,000 genes most perturbed by drug treatment. These signatures were subject to feature selection using Fisher’s exact test to identify genes predictive of DILI status. Models based solely on expression signatures had varying results for clinical DILI subtypes with an accuracy ranging from 0.49 to 0.67 and Matthews Correlation Coefficient (MCC) values ranging from -0.03 to 0.1. Models built using FAERS, MOLD2, and TOX21 also had similar results in predicting clinical DILI scores with accuracy ranging from 0.56 to 0.67 with MCC scores ranging from 0.12 to 0.36. To incorporate these various data types with expression-based models, we utilized soft, hard, and weighted ensemble voting methods using the top three performing models for each DILI classification. These voting models achieved a balanced accuracy up to 0.54 and 0.60 for the clinically relevant DILI subtypes. Overall, from our experiment, traditional machine learning approaches may not be optimal as a classification method for the current data.

Highlights

  • Adverse drug reactions (ADRs) are a common concern of novel drugs and therapeutics

  • While we built many models, we compared and picked the best three models based on the area under the curve (AUC) values to predict Drug-Induced Liver Injury (DILI) class on the test set

  • We evaluated the predictability of these datasets on four DILI types, namely, DILI1, DILI3, DILI5, and DILI6

Read more

Summary

Introduction

Adverse drug reactions (ADRs) are a common concern of novel drugs and therapeutics. One of the more common targets of ADRs is the liver due to its role in the metabolism of compounds and resulting liver damage is termed as Drug-Induced Liver Injury (DILI) (Daly, 2013; Atienzar et al, 2016; Marzano et al, 2016). The U.S Food and Drug Administration (2021) has established the DILIrank dataset, the largest reference drug list ranked for DILI risk in humans, to facilitate the development of predictive models by enhancing drug label DILI annotation with weighted causal evidence (Chen et al, 2016b). This dataset contains four classifications, including most, less, ambiguous, and no-DILI concern, regarding 1,036 FDA-approved drugs. They confirmed that microRNA-122 can be used as a sensitive biomarker for DILI (Messner et al, 2020)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.