Manual Validation of CTA Stroke Data Analyzed by a Natural Language Processing (NLP) Method

Krishna Sambhu

doi:10.48448/n55g-ye31

Abstract

Background: Automating the process of obtaining data through natural language processing (NLP) can be useful in extracting salient information from unstructured radiology reports. However, to ensure rigorous quality control, scalable gold-standard protocols for manual verification of automatic data are essential. Our aim was to develop a standardized pipeline for the verification of large vessel occlusion (LVO) locations in head CT angiograms (CTAs) determined by an NLP algorithm. Our developed pipeline can be applied to other variables that require manual verification for quality control of retrospective datasets. Methods: Our initial dataset included 866 patients with 4184 radiology reports; from this we extracted 489 patients with 513 CTA reports, who presented to two tertiary-care institutions with acute large ischemic stroke (>1/2 middle cerebral artery territory) from 2006-2021. We constructed a rule-based system for feature detection using regular expressions and spaCy, a Python NLP integration with advanced sentence parsing and negation to automatically indicate location of LVO. Thereafter, we developed a pipeline for gold-standard verification including manual review of all reports and documentation of feedback with root-cause analysis to understand automatic errors. Secondary review of selected reports by an attending neurologist was also performed, allowing for correction of errors from initial manual review. Results: We found that the automated NLP algorithm had an accuracy of 78.8% in terms of LVO designation, after initial manual review of CTA reports. Secondary MD review was performed on 60/513 (11.7%) reports with 7/60 (11.7%) reports found to have errors from the initial manual review. After complete review, the frequencies of LVO locations were found to be 38.4% internal carotid artery (ICA)/tandem, 56.9% M1, 36.3% M2, 4.3% M3/M4, 15.0% anterior cerebral artery (ACA), and 0.4% indeterminate location. The most common reason for NLP error included the inability to interpret radiographic findings in the context of surrounding text. For example, the algorithm would misclassify reports with modifiers including “small” or “incomplete,” indicating the lack of complete occlusion. Conclusion: While tailored rule-based NLP methods can expedite radiology report review for stroke patients, they may err. Therefore, protocolized manual review of radiology reports is necessary, especially for reports with ambiguous language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Manual Validation of CTA Stroke Data Analyzed by a Natural Language Processing (NLP) Method

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Utilization of Natural Language Processing in Venous Thromboembolism Identification
Jonathan Avery ... Ang Li
Blood | VOL. 140
Jonathan Avery, et. al.Jonathan Avery ... Ang Li
15 Nov 2022
Blood | VOL. 140

Natural language processing of radiology reports for identification of skeletal site-specific fractures
Yanshan Wang ... Saeed Mehrabi
BMC Medical Informatics and Decision Making | VOL. 19
Yanshan Wang, et. al.Yanshan Wang ... Saeed Mehrabi
01 Apr 2019
BMC Medical Informatics and Decision Making | VOL. 19

Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients
Amol A Verma ... Fahad Razak
Thrombosis Research | VOL. 209
Amol A Verma, et. al.Amol A Verma ... Fahad Razak
27 Nov 2021
Thrombosis Research | VOL. 209

Natural Language Processing to Identify Pulmonary Nodules and Extract Nodule Characteristics From Radiology Reports
Chengyi Zheng ... Michael K Gould
Chest | VOL. 160
Chengyi Zheng, et. al.Chengyi Zheng ... Michael K Gould
04 Jun 2021
Chest | VOL. 160

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Manual Validation of CTA Stroke Data Analyzed by a Natural Language Processing (NLP) Method

Abstract

Talk to us

Similar Papers