Abstract

Background: Accurate identification of stroke cases from electronic health record (EHR) is needed for efficient and valid clinical and epidemiology research. A number of studies have evaluated the validity of using ICD-9 and ICD-10 codes for stroke identification. Previous research results show that sensitivity and positive predictive value (PPV) for stroke identification is the lowest among all other cerebrovascular diseases, with notable differences observed by stroke pathological subtypes. While most prevalent cerebrovascular disease cases can be detected using 430-438/I60-I69 collectively, more accurate and comprehensive stroke phenotyping algorithm is needed to identify incident stroke cases from EHR. In our work, we compared case identification results using ICD-codes exclusively with ICD plus a natural language processing (NLP) algorithm. Methods: We developed the NLP part of our stroke algorithm using a list of expert provided stroke-related keywords, which covers transient ischemic attack (TIA), ischemic stroke, and hemorrhagic stroke. We validated the hybrid (ICD+NLP) algorithm and compared it with ICD-exclusive algorithm in a previously established atrial fibrillation cohort (n=5,062). Clinical notes and ICD codes of all patients after the incident AF event were reviewed by two nurse abstracters to confirm new stroke incidences. All past clinical notes and ICD codes of a subset of patients (n=402) were reviewed to confirm lifetime (prior and current) stroke cases. Manual abstraction results were considered the gold standard for evaluation of ICD only and ICD+NLP automatic extraction. Sensitivity and positive predictive values (PPV) of both algorithms were calculated. Results: Among 5,062 patients, 593 patients were confirmed to have suffered a stroke after atrial fibrillation while 4,469 patients were confirmed with no EHR evidence of stroke after AF. The ICD-exclusive algorithm had a sensitivity of 47.2% and a PPV of 47.8% for detecting new stroke incidences. The hybrid stroke algorithm achieved a sensitivity of 92.4% and a PPV of 60.6% for extraction of incident stroke. For extraction of lifetime stroke cases, the hybrid approach achieved a sensitivity of 93.9% and a PPV of 88.7%. Performance of new stroke incidence extraction is limited because past/current stroke mentions in clinical notes are difficult to distinguish by our NLP algorithm (may be a future direction). Conclusions: We developed and validated a stroke algorithm that performed well for identifying incident and lifetime stroke cases. The addition of NLP into the stroke algorithm improved the sensitivity and PPV of compared to an ICD-exclusive algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call