Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset

Chuizheng Meng,James Enouen,Nan Xu,Yan Liu,Loc Trinh

doi:10.1038/s41598-022-11012-2

Chuizheng Meng, James Enouen + Show 3 more

Open Access

https://doi.org/10.1038/s41598-022-11012-2

Copy DOI

Journal: Scientific Reports	Publication Date: May 3, 2022
Citations: 75	License type: open-access

Affiliation: University of Southern California

Abstract

The recent release of large-scale healthcare datasets has greatly propelled the research of data-driven deep learning models for healthcare applications. However, due to the nature of such deep black-boxed models, concerns about interpretability, fairness, and biases in healthcare scenarios where human lives are at stake call for a careful and thorough examination of both datasets and models. In this work, we focus on MIMIC-IV (Medical Information Mart for Intensive Care, version IV), the largest publicly available healthcare dataset, and conduct comprehensive analyses of interpretability as well as dataset representation bias and prediction fairness of deep learning models for in-hospital mortality prediction. First, we analyze the interpretability of deep learning mortality prediction models and observe that (1) the best-performing interpretability method successfully identifies critical features for mortality prediction on various prediction models as well as recognizing new important features that domain knowledge does not consider; (2) prediction models rely on demographic features, raising concerns in fairness. Therefore, we then evaluate the fairness of models and do observe the unfairness: (1) there exists disparate treatment in prescribing mechanical ventilation among patient groups across ethnicity, gender and age; (2) models often rely on racial attributes unequally across subgroups to generate their predictions. We further draw concrete connections between interpretability methods and fairness metrics by showing how feature importance from interpretability methods can be beneficial in quantifying potential disparities in mortality predictors. Our analysis demonstrates that the prediction performance is not the only factor to consider when evaluating models for healthcare applications, since high prediction performance might be the result of unfair utilization of demographic features. Our findings suggest that future research in AI models for healthcare applications can benefit from utilizing the analysis workflow of interpretability and fairness as well as verifying if models achieve superior performance at the cost of introducing bias.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset

Abstract

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Development and Validation of Machine Learning Models for Real-Time Mortality Prediction in Critically Ill Patients With Sepsis-Associated Acute Kidney Injury.
Xiao-Qin Luo ... Ping Yan
Frontiers in Medicine | VOL. 9
Xiao-Qin Luo, et. al.Xiao-Qin Luo ... Ping Yan
15 Jun 2022
Frontiers in Medicine | VOL. 9

Mortality prediction for patients with acute respiratory distress syndrome based on machine learning: a population-based study.
Bingsheng Huang ... Guo Dan
Annals of translational medicine | VOL. 9
Bingsheng Huang, et. al.Bingsheng Huang ... Guo Dan
01 May 2021
Annals of translational medicine | VOL. 9

Predicting the COVID-19 mortality among Iranian patients using tree-based models: A cross-sectional study.
Amirhossein Aghakhani ... Mir Saeed Yekaninejad
Health science reports | VOL. 6
Amirhossein Aghakhani, et. al.Amirhossein Aghakhani ... Mir Saeed Yekaninejad
01 May 2023
Health science reports | VOL. 6

Benchmarking deep learning models on large healthcare datasets
Sanjay Purushotham ... Yan Liu
Journal of Biomedical Informatics | VOL. 83
Sanjay Purushotham, et. al.Sanjay Purushotham ... Yan Liu
05 Jun 2018
Journal of Biomedical Informatics | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset

Abstract

Talk to us

Similar Papers

More From: Scientific Reports