Optimizing the fairness of survival prediction models for racial/ethnic subgroups: A study on predicting post-operative survival in stage IA and IB non-small cell lung cancer.

Kanan Shah,Elaine Shum,Yassamin Neshatvar,Madhur Nayan

doi:10.1200/op.2024.20.10_suppl.380

Abstract

380 Background: The recent surge of utilizing machine learning (ML) to develop prediction models for clinical decision-making aids is promising. However, these models can demonstrate racial bias due to inequities in real-world training data. In lung cancer, multiple models have been developed to predict prognosis, but none have been optimized to mitigate bias in performance among racial/ethnic subgroups. We developed a ML model to predict five-year survival in Stage 1A-1B non-small cell lung cancer (NSCLC), ensuring fairness on race. Methods: In the National Cancer Database, we identified patients with histopathologically confirmed stage 1A -1B NSCLC who underwent curative intent lobectomy from 2004 – 2017. We split the study cohort into a training and test sets (70%/30%). We trained and compared various ML models to predict 5-year overall survival. Patient demographic, clinical, and disease characteristics were used as input features for the models. To evaluate model fairness, we used the equalized odds ratio (eOR), which compares the true positive and false positive rates across groups; an eOR value of 1 represents equivalent rates across racial groups. We utilized 3 approaches to mitigate model bias and optimize for fairness of the best “naïve” model: grid search, threshold optimizer, and the exponentiated gradient methods. We evaluated model performance before and after bias mitigation using the area under the curve (AUC). Results: 124,298 patients fit our inclusion/exclusion criteria; 87% of patients were White, 8% were Black/African American, 3% Hispanic, and 2% Asian. Eighty percent of patients were diagnosed with stage 1A cancer; 20% had stage 1B cancer. The best naïve ML model, not optimized for fairness on race, had an eOR of 0.25 with an AUC of 0.66 (95% CI 0.65-0.66) overall. This model demonstrated an AUC of 0.65 (0.65-0.66) among white patients, 0.64 (0.62-0.66) among Black patients, 0.64 (0.60-0.68) among Asian patients, and 0.71 (0.68-0.74) among Hispanic patients. The threshold optimizer bias mitigation strategy improved fairness the most while maintaining similar overall performance of AUC 0.65 (0.64-0.66). With this strategy the eOR improved to 0.83 while AUC remained relatively stable across racial subgroups. Conclusions: We developed a ML model to predict 5-year survival in patients undergoing surgery for stage IA-IB NSCLC and employed model bias mitigation strategies that significantly improved model fairness, without diminishing overall performance. These strategies should be considered when developing prediction models for clinical decision making to avoid perpetuating disparities in care due to algorithm bias. Model performance metrics. Equalized Odds Ratio True Positive Rate False Positive Rate AUC Naive model 0.25 0.61 0.30 0.66 Threshold optimizer mitigated model 0.83 0.62 0.32 0.65

Full Text