Abstract The traditional cancer paradigm, in which radiographic scan precedes biopsy which precedes diagnosis, has several limitations, but lack of patient identification prior to biopsy lies at the crux. Here we describe the results of our retrospective pilot study, in which we characterize the output of a natural language processing (NLP) model we utilized to review and flag radiology reports suspicious for pancreas cancer and demonstrate its potential for identification of patients with pancreas cancer at earlier points in the diagnostic framework. A RadBERT based NLP model was utilized to identify radiology reports suspicious for pancreas cancer that were generated within the Northwell Health system in January 2023. These outputs were then manually reviewed retrospectively and labelled as non-suspicious non-cystic, non-suspicious cystic, or suspicious non-cystic. For patients with suspicious non-cystic lesions, chart review was performed to determine whether these findings were indicative of a new pancreatic malignancy, and if so, whether the patient had received further care. Our NLP reviewed 17,339 radiology reports in January 2023 and identified 917 of these reports as suspicious for pancreas cancer, representing imaging from 852 patients across our health system. 38% of patients had normal or atypical but non-suspicious non-cystic lesions (322). 46% of patients had non-suspicious cystic lesions (389) and 17% of patients had suspicious non-cystic lesions suspicious for a primary pancreatic neoplasm (149); 10 of these reports observed simultaneous cystic and neoplastic pancreas lesions. Of the patients with suspected neoplastic lesions, 55% had been previously diagnosed with pancreas cancer. 45% were new diagnoses and upon chart review, of these, only 36% underwent biopsy in our health system with an average of 22 days between radiology report and biopsy while 30% were seen by an outpatient oncologist within our health system with an average of 32 days between radiology report and visit. 64% did not proceed to further workup in our health system with 35% for whom further workup was not recommended, 19% who elected against further workup, 23% who were scheduled for internal follow-up that did not occur, and 23% who were scheduled for external follow-up. Notably, when compared to the January outputs of other cohort building tools, NLP identified 67 new patients with pancreas cancer, while pathology report identified 15, ICD10 problem list identified 17, and cancer registry identified 13. Our NLP approach identified the greatest proportion of patients with newly diagnosed pancreas cancer compared to traditional identification measures and of these patients identified by our model, in the context of the traditional cancer care paradigm, only 1/3 proceeded to further workup within our health system before facing a striking nearly month of lead-time between radiology report and biopsy/outpatient oncology visit. We are now working to leverage this model to prospectively identify and navigate patients identified by NLP to address this gap in real-time. Citation Format: Kristen M. John, Anthony Carvino, Shama Khan, Ellen Chen, Deepak Saluja, Matthew Barish, Daniel A. King. Retrospective pilot study of a natural language processing model approach for earlier identification of patients with pancreas cancer [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: Pancreatic Cancer; 2023 Sep 27-30; Boston, Massachusetts. Philadelphia (PA): AACR; Cancer Res 2024;84(2 Suppl):Abstract nr B068.
Read full abstract