Abstract

Abstract MOTIVATION: AI-based decision-making tools hold the potential to reduce patient suffering and burden. However, healthcare data reflects inequity, and datasets often lack representation of underprivileged groups. Without systems and methods to protect all patients, we risk making dangerous biases automatic and invisible. We can only ensure protection against bias if performance is monitored in production, continuously and with meaningful measures. Notably, a lack of representation and bias in the validation of diverse ancestry in genetics data results in genetic misdiagnoses and potential health disparities. Similarly, monitoring calibration in addition to receiver operating characteristic curve area under curve (ROC-AUC) in subgroups could have prevented bias predictions leading to underestimation of the health needs of sicker patients. Here, we describe methods and results implemented in an AI-based clinical decision-making tool to protect against bias. METHODS: Our model uses proven techniques and metrics to identify and monitor bias and quantify the magnitude of its impact. Our algorithm compares all demographic subgroups to that of a top performer based on overall model performance (measured by both ROC-AUC and Brier score). To determine if the top-performing subgroup is receiving statistically significant preferential treatment, we perform t-tests for each other subgroup against the top performer (p-value > 0.05 indicates that performance is statistically significant) using sample-wise performance metrics (log-loss in this case). We then quantify the magnitude of the difference by computing the ratio of the top-performer to that of the affected subgroup. The bias detection algorithm is continuously run as our model generates risk predictions based on up-to-date patient data. If the algorithm identifies a bias, it immediately notifies data scientists and engineers to begin rapidly restoring balanced model performance for clinical use by investigating algorithms that optimize for overall and balanced performance by subgroup. RESULTS: We trained and validated our machine learning model to predict 30-day emergency department visits using a data set with over 28,000 oncology patients representing diverse cancer types, races, ethnicities, ages, genders, and socio-economic statuses. Our model performance on a held-out validation data set was exceptional overall (AUC 0.8, Brier score 0.07) and across all cancer type and demographic subgroups (AUC 0.74-0.82, Brier score 0.06-0.1). We continuously monitored our live model in production over three months to ensure consistent and fair performance. Our overall and subgroup performance in production never dropped below AUC 0.75 or above Brier score 0.1. CONCLUSIONS: We demonstrate an effective framework for continuously and automatically detecting and combating algorithmic bias in oncology AI decision-making tools. Citation Format: Renee D. George, Benjamin H. Ellis, Chris J. Sidey-Gibbons, Christine Swisher. Protecting against algorithmic bias of AI-based clinical decision making tools in oncology [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 1970.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call