Detection of immune-related adverse events among hospitalized patients using large language models.

Virginia H Sun,Vineet K Raghu,Azin Ghamari,Molly Fisher Thomas,Julius C Heemelaar,Mgh Severe Immunotherapy Complications Service ,Steven Michael Blum,Giselle Alexandra Suero-Abreu,Jor Sam Ho,Michael L Dougan,Chia-Yun Wu,Megan J Mooradian,Tomas G Neilan,Ibrahim Hadzic,Jessica Wu,Alexandra-Chloé Villani,Ryan J Sullivan,Daniel A Zlotoff,Leyre Zubiri,Kerry Lynn Reynolds,Meghan E Sise

doi:10.1200/jco.2024.42.16_suppl.2638

Abstract

2638 Background: Immune checkpoint inhibitor (ICI)-induced colitis, hepatitis, and pneumonitis are common immune-related adverse events (irAEs); however, the true incidence for these irAEs remains incompletely understood. Chart review is the gold standard for their detection but is time-consuming and cannot be implemented in large cohorts. The use of ICD codes is limited in sensitivity and specificity. Large language models (LLMs) are a scalable method of answering queries from human-generated text, though there is no data on the use of LLM for the identification of irAEs. Therefore, we investigated the application of a LLM to identify ICI-colitis, hepatitis, and pneumonitis among hospitalized patients, comparing its performance to manual chart review and ICD codes. Methods: Hospital admissions of patients on ICI therapy from February 5th, 2011, to November 3rd, 2021, were manually reviewed by a multidisciplinary immunotoxicity team using established published definitions for the presence of ICI colitis, hepatitis, and pneumonitis. Standard ICD codes and a LLM pipeline with retrieval-augmented generation (RAG) were used to detect irAEs. Performance was measured via sensitivity, specificity, and model runtime. The LLM was validated with a second dataset of inpatients with ICI colitis, hepatitis, and pneumonitis admitted from November 4th, 2021, to September 5th, 2023. Results: Among 5,677 hospitalized patients on ICI therapy in the initial cohort, there were 132 cases adjudicated with ICI colitis, 57 with ICI hepatitis, and 47 with ICI pneumonitis. The LLM was more sensitive in detecting all three irAEs compared to ICD codes (94.2% vs. 71.8%), achieving significance for ICI hepatitis (p<0.001) and pneumonitis (p=0.006), while having similar specificities (92.5% vs 91.1%, Table 1). The LLM approach was also efficient, spending an average of 9.42s per chart, compared to an estimated 15 minutes per chart for individual chart review. The mean sensitivity and specificity of the LLM on the validation dataset for adjudicated ICI colitis (n=20), hepatitis (n=24), and pneumonitis (n=6) were 96.9% and 93.2%, respectively. Conclusions: LLMs serve as a useful tool for the detection of ICI colitis, hepatitis, and pneumonitis, significantly outperforming ICD-codes in accuracy and manual chart review in efficiency.[Table: see text]

Full Text