e13583 Background: IrAEs monitoring in patients undergoing Immune checkpoint inhibitor (ICI) treatments relies on analyzing clinical notes and structure data from electronic health records. As IrAEs are not yet fully captured in the ICD claims, they could be under reported. On the other hand, IrAEs are usually recorded in the clinical notes. A natural language processing (NLP) tool is highly desirable to extract them from clinical notes. In this study, we aim to compare the performance of our machine learning (ML) NLP model in IrAE extraction against 1) manual chart reviews and 2) ICD claims data. Methods: The study validation set included 194 newly diagnosed stage I-VI gastrointestinal cancer patients treated with ICIs at the Ohio State University Comprehensive Cancer Center. An irAE was defined as the occurrence of gastrointestinal, pulmonary, dermatological, or endocrine adverse events post-ICI infusion. Our NLP ML model was trained on 1230 clinical notes from 20 positive (who had ADEs) and 10 negative (who did not have ADEs) patients who received ICIs. ICD irAE method is based on ICD-9 and ICD-10 codes, which identifies irAEs post-ICI infusions. All manual reviews of ICI induced irAEs were done by a medical oncologist specializing in gastrointestinal cancers and included patient charts, labs, and imaging post-ICI infusions. Results: In this 194 patient cohort, the median age was 61 years, 68% males, 89% White non-Hispanics, and 29% had colorectal cancer. The study included 8,500 medical notes with a median 29 notes per patient and 1,283 words per note. An estimated 4,726 (55.6%) were progress notes, 1,178 (13.9%) patient Instructions, and 430 (5.1%) consults. The incidences of irAEs were 13.9%, 25.8%, and 27.3%, respectively, using manual reviews, the ML model, and ICD claims, respectively. Compared to the manual reviews, the ML model had a 78% sensitivity, 83% specificity, 0.17 false positive rate (FPR), and 0.80 (95% CI, 0.72 to 0.89) area under the curve (AUC). In contrast, ICD claims had a poor agreement compared to manual reviews. 52% sensitivity, 77% specificity, 0.23 FPR, and AUC 0.64 (95% CI, 0.54 to 0.74). Conclusions: Using clinical notes, ML can identify patients with irAEs with a great potential to infer causality in real time clinical practice. Our irAE manual review analysis further indicates that ML model performance can be further improved if including lab and imaging data.
Read full abstract