Application of natural language processing to post-structuring of rectal cancer MRI reports

W Liu,L Cai,Y Li

doi:10.1016/j.crad.2023.10.032

Abstract

To evaluate a natural language processing (NLP) system for extracting structured information from the free-form text of rectal cancer magnetic resonance imaging (MRI) reports written in Chinese. A rule-based NLP model that could extract 11 key image features of rectal cancer was constructed using 358 MRI reports of rectal cancer written between 2015 and 2021. Fifty reports written before 2015 and 50 written after 2021 were used as test datasets, and the reference standard was determined by manual extraction of information by two radiologists. The length and reporting rate of image features in pre-2015 and post-2021 datasets, as well as the accuracy, precision, recall, and F1 score of feature extraction by the NLP system, were compared. The time required for the NLP to extract data was compared with that required by the radiologists. Reports written after 2021 had longer diagnostic impression sections than reports written before 2015. The reporting rate of key imaging features of rectal cancer was 36.55% before 2015 and 79.82% after 2021. The accuracy, precision, recall, and F1 score of NLP for correct extraction of values from reports were 93.82%, 95.63%, 87.06%, and 91.15%, respectively, for pre-2015 reports, and 92.55%, 98.53%, 94.15%, and 96.29%, respectively, for post-2021 reports. NLP generated all the structured information in <1 second. The NLP system with rule-based pattern matching achieved rapid and accurate structured processing of rectal cancer MRI reports. MRI reports with structured templates are more suitable for NLP-based extraction of information.

Full Text