Abstract
Background: Automating the process of obtaining data through natural language processing (NLP) can be useful in extracting salient information from unstructured radiology reports. However, to ensure rigorous quality control, scalable gold-standard protocols for manual verification of automatic data are essential. Our aim was to develop a standardized pipeline for the verification of large vessel occlusion (LVO) locations in head CT angiograms (CTAs) determined by an NLP algorithm. Our developed pipeline can be applied to other variables that require manual verification for quality control of retrospective datasets. Methods: Our initial dataset included 866 patients with 4184 radiology reports; from this we extracted 489 patients with 513 CTA reports, who presented to two tertiary-care institutions with acute large ischemic stroke (>1/2 middle cerebral artery territory) from 2006-2021. We constructed a rule-based system for feature detection using regular expressions and spaCy, a Python NLP integration with advanced sentence parsing and negation to automatically indicate location of LVO. Thereafter, we developed a pipeline for gold-standard verification including manual review of all reports and documentation of feedback with root-cause analysis to understand automatic errors. Secondary review of selected reports by an attending neurologist was also performed, allowing for correction of errors from initial manual review. Results: We found that the automated NLP algorithm had an accuracy of 78.8% in terms of LVO designation, after initial manual review of CTA reports. Secondary MD review was performed on 60/513 (11.7%) reports with 7/60 (11.7%) reports found to have errors from the initial manual review. After complete review, the frequencies of LVO locations were found to be 38.4% internal carotid artery (ICA)/tandem, 56.9% M1, 36.3% M2, 4.3% M3/M4, 15.0% anterior cerebral artery (ACA), and 0.4% indeterminate location. The most common reason for NLP error included the inability to interpret radiographic findings in the context of surrounding text. For example, the algorithm would misclassify reports with modifiers including “small” or “incomplete,” indicating the lack of complete occlusion. Conclusion: While tailored rule-based NLP methods can expedite radiology report review for stroke patients, they may err. Therefore, protocolized manual review of radiology reports is necessary, especially for reports with ambiguous language.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.