Abstract

664 Background: Access to clinical trials and translational research studies is integral to improving cancer care for pancreatic cancer. The in-patient setting is a valuable opportunity to obtain research specimens and establish transition to out-patient oncology. However, samples coordinators rarely have sufficient lead time prior to biopsy to consent patients and be present for tissue collection. Meanwhile, histopathology of biopsied specimens may not be complete prior to discharge, hindering follow-up planning. Polyethnic-1000 is a research study enrolling African American patients with pancreatic cancer, is a priority study for our institution, and was the target study that motivated the below innovative strategy to better obtain research samples and ensure adequate follow-up. Methods: We observed recurrent suspicious abnormalities on abdominal radiology reports that foreshadow subsequent biopsy of the pancreas. To explore this further, we trained a natural language classifier to identify radiology reports as suspicious for pancreatic cancer. The pretrained model used is RadBERT, a custom-built language model based on Google’s BERT and further pretrained on several million radiology text reports. A gastrointestinal oncologist, clinical research coordinator, and radiologist informatician curated a dataset consisting of 1136 CT abdominal radiology reports, among which 696 were considered suspicious for pancreatic cancer. Among this curated database of reports, a training set consisting of 908 reports was used to fine-tune the RadBERT base model and produce our pancreatic cancer suspicion classifier. A validation and test dataset each consisting of 114 cases were reserved for model evaluation. We computed the final classifier performance metrics over the test dataset. Results: The model obtained an accuracy of 92%, and produced an F1 score of 0.936, a recall of 0.943, and a precision of 0.930. Conclusions: An early iteration model to classify CT abdominal radiology reports was highly accurate and has the potential to substantially increase identification of patients who may be suitable for clinical research studies and trials. We are now working to improve and expand our dataset in order to train classifiers with greater and more robust performance, while simultaneously developing an automated near real-time process that will leverage our models to identify reports suspicious for pancreatic cancer and then notify a coordinator and navigator.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call