A natural language processing tool for automatic identification of new disease and disease progression: Parsing text in multi-institutional radiology reports to facilitate clinical trial eligibility screening.

Eric J Clayton,Maggie D Howell,Sunita Pierson,David Michael Waterhouse,Imon Banerjee,Rachel Hall,Peter B Harrison,Beth Lohmueller,Patrick J Ward

doi:10.1200/jco.2021.39.15_suppl.1555

Eric J Clayton, Maggie D Howell + Show 7 more

https://doi.org/10.1200/jco.2021.39.15_suppl.1555

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

1555 Background: Screening every patient for clinical trials is time-consuming, costly and inefficient. Developing an automated method for identifying patients who have potential disease progression, at the point where the practice first receives their radiology reports, but prior to the patient’s office visit, would greatly increase the efficiency of clinical trial operations and likely result in more patients being offered trial opportunities. Methods: Using Natural Language Processing (NLP) methodology, we developed a text parsing algorithm to automatically extract information about potential new disease or disease progression from multi-institutional, free-text radiology reports (CT, PET, bone scan, MRI or x-ray). We combined semantic dictionary mapping and machine learning techniques to normalize the linguistic and formatting variations in the text, training the XGBoost model particularly to achieve a high precision and accuracy to satisfy clinical trial screening requirements. In order to be comprehensive, we enhanced the model vocabulary using a multi-institutional dataset which includes reports from two academic institutions. Results: A dataset of 732 de-identified radiology reports were curated (two MDs agreed on potential new disease/dz progression vs stable) and the model was repeatedly re-trained for each fold where the folds were randomly selected. The final model achieved consistent precision (>0.87 precision) and accuracy (>0.87 accuracy). See the table for a summary of the results, by radiology report type. We are continuing work on the model to validate accuracy and precision using a new and unique set of reports. Conclusions: NLP systems can be used to identify patients who potentially have suffered new disease or disease progression and reduce the human effort in screening or clinical trials. Efforts are ongoing to integrate the NLP process into existing EHR reporting. New imaging reports sent via interface to the EHR will be extracted daily using a database query and will be provided via secure electronic transport to the NLP system. Patients with higher likelihood of disease progression will be automatically identified, and their reports routed to the clinical trials office for clinical trial screening parallel to physician EHR mailbox reporting. The over-arching goal of the project is to increase clinical trial enrollment. 5-fold cross-validation performance of the NLP model in terms of accuracy, precision and recall averaged across all the folds.[Table: see text]

Full Text