Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.

Elizabeth A Campbell,Saurav Bose,Aaron J Masino

doi:10.1371/journal.pdig.0000642

Abstract

Electronic Health Records (EHRs) are increasingly used to develop machine learning models in predictive medicine. There has been limited research on utilizing machine learning methods to predict childhood obesity and related disparities in classifier performance among vulnerable patient subpopulations. In this work, classification models are developed to recognize pediatric obesity using temporal condition patterns obtained from patient EHR data in a U.S. study population. We trained four machine learning algorithms (Logistic Regression, Random Forest, Gradient Boosted Trees, and Neural Networks) to classify cases and controls as obesity positive or negative, and optimized hyperparameter settings through a bootstrapping methodology. To assess the classifiers for bias, we studied model performance by population subgroups then used permutation analysis to identify the most predictive features for each model and the demographic characteristics of patients with these features. Mean AUC-ROC values were consistent across classifiers, ranging from 0.72-0.80. Some evidence of bias was identified, although this was through the models performing better for minority subgroups (African Americans and patients enrolled in Medicaid). Permutation analysis revealed that patients from vulnerable population subgroups were over-represented among patients with the most predictive diagnostic patterns. We hypothesize that our models performed better on under-represented groups because the features more strongly associated with obesity were more commonly observed among minority patients. These findings highlight the complex ways that bias may arise in machine learning models and can be incorporated into future research to develop a thorough analytical approach to identify and mitigate bias that may arise from features and within EHR datasets when developing more equitable models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.

Abstract

Talk to us

Similar Papers

More From: PLOS digital health

Lead the way for us

Journal: PLOS digital health	Publication Date: Oct 23, 2024
License type: CC BY 4.0

Similar Papers

A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets
Michele Bernardini ... Massih-Reza Amini
Computers in Biology and Medicine | VOL. 163
Michele Bernardini, et. al.Michele Bernardini ... Massih-Reza Amini
22 Jun 2023
Computers in Biology and Medicine | VOL. 163

Evaluation of Electronic Health Record and Long-Term Care Pharmacy Data for Tracking and Reporting Antibiotic Use in the United States
Matthew Hudson ... Stephen Creasy
Antimicrobial Stewardship & Healthcare Epidemiology | VOL. 1
Matthew Hudson, et. al.Matthew Hudson ... Stephen Creasy
01 Jul 2021
Antimicrobial Stewardship & Healthcare Epidemiology | VOL. 1

Prediction of emergency department revisits among child and youth mental health outpatients using deep learning techniques
Simran Saggu ... Laura Duncan
BMC Medical Informatics and Decision Making | VOL. 24
Simran Saggu, et. al.Simran Saggu ... Laura Duncan
08 Feb 2024
BMC Medical Informatics and Decision Making | VOL. 24

Exploiting Missing Value Patterns for a Backdoor Attack on Machine Learning Models of Electronic Health Records: Development and Validation Study.
Byunggill Joe ... Insik Shin
JMIR Medical Informatics | VOL. 10
Byunggill Joe, et. al.Byunggill Joe ... Insik Shin
19 Aug 2022
JMIR Medical Informatics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.

Abstract

Talk to us

Similar Papers

More From: PLOS digital health