Abstract

BackgroundBetter phenotyping of routinely collected coded data would be useful for research and health improvement. For example, the precision of coded data for hemorrhagic stroke (intracerebral hemorrhage [ICH] and subarachnoid hemorrhage [SAH]) may be as poor as < 50%. This work aimed to investigate the feasibility and added value of automated methods applied to clinical radiology reports to improve stroke subtyping.MethodsFrom a sub-population of 17,249 Scottish UK Biobank participants, we ascertained those with an incident stroke code in hospital, death record or primary care administrative data by September 2015, and ≥ 1 clinical brain scan report. We used a combination of natural language processing and clinical knowledge inference on brain scan reports to assign a stroke subtype (ischemic vs ICH vs SAH) for each participant and assessed performance by precision and recall at entity and patient levels.ResultsOf 225 participants with an incident stroke code, 207 had a relevant brain scan report and were included in this study. Entity level precision and recall ranged from 78 to 100%. Automated methods showed precision and recall at patient level that were very good for ICH (both 89%), good for SAH (both 82%), but, as expected, lower for ischemic stroke (73%, and 64%, respectively), suggesting coded data remains the preferred method for identifying the latter stroke subtype.ConclusionsOur automated method applied to radiology reports provides a feasible, scalable and accurate solution to improve disease subtyping when used in conjunction with administrative coded health data. Future research should validate these findings in a different population setting.

Highlights

  • Better phenotyping of routinely collected coded data would be useful for research and health improvement

  • Predicting the best possible performance of automated methods in assigning a stroke subtype among UK Biobank (UKB) participants with a hemorrhagic stroke subtype code To understand the best results that an automated approach could potentially achieve, we further investigated if clinical brain scan reports contain the necessary information for a human expert to assign a hemorrhagic stroke subtype

  • One of the 72 participants with multiple relevant reports was assigned to two different stroke subtype categories (ICH and Subarachnoid hemorrhage (SAH)) with the automated methods

Read more

Summary

Introduction

Better phenotyping of routinely collected coded data would be useful for research and health improvement. UK Biobank (UKB) is a prospective population-based cohort study with extensive phenotypic and genotypic information on > 500,000 participants (www.ukbiobank.ac.uk). It is an open access resource, established. Among subtype specific codes, hemorrhagic stroke codes may have precision as low as 42% [2] This will be a limitation for many researchers since stroke is a heterogeneous disease, and genetic and environmental risk factors to date have been found to be very subtype specific. While coded data can be used to identify all-cause dementia, accuracy in identifying dementia subtypes, in particular vascular dementia, is much lower [4, 5] This may be a limitation for researchers studying genetic and environmental associations specific to disease subtypes, and automated, scalable methods are urgently needed to improve disease subtyping

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call