Codifying unstructured data: A Natural Language Processing approach to extract rich data from clinical letters

Arron Lacey,Jane Lyons,Ashley Akbari,Ronan A Lyons,Beata Fonferko‐Shadrach ,David Ford ,Mark I Rees ,Samantha Turner ,Owen Pickrell ,Angharad Walters ,Rod Middleton

doi:10.23889/ijpds.v1i1.354

Arron Lacey, Jane Lyons + Show 9 more

Open Access

https://doi.org/10.23889/ijpds.v1i1.354

Copy DOI

Abstract

ABSTRACT ObjectivesElectronic healthcare records (EHR) are the main data sources that facilitate epidemiology research. Routinely collected data such as primary and secondary care are now easily linked to produce novel and high impact research. There are, however, rich data locked in the free text of clinical letters that are not otherwise translated into EHRs. It is highly desirable to be able to extract this information to strengthen the body of information in existing EHRs. The Swansea Collaborative in Analysis of NLP Research (SCANR) group at Swansea University has been established to evaluate the usage of Natural Language Processing platforms for obtaining new clinical data. To use Clix Enrich to extract SNOMED concepts from a variety of clinical free texts and produce EHRs from the extraction process. Approach SNOMED concepts contain common items of interest such as diagnosis, medication and symptoms, as well as contextual concepts such as historical reference and negation. Clix Enrich uses the SNOMED dictionary to encode clinical free text (pre-co-ordinated) and find contextually correct SNOMED concepts (post co-ordinated). We used Clix Enrich to extract meaningful clinical terms from MS and Epilepsy consultant letters, as well as presenting complaint fields from a Welsh Emergency Department (ED). ResultsWe tailored Clix Enrich to extract a wide variety of clinical terms from each source (fourty texts per source) and validated the extraction accuracy with clinical experts in each domain. Clix Enrich was able to accurately extract the correct diagnosis for MS, Epilepsy and ED attendance (100%, 95% and 80%), dosage and frequency of anti-epileptic medication and MS modifying therapy (90%, 100%) and EDDS score (94%). We note a probable source of discrepancy in extraction accuracy between letter sources in the frequency of abbreviated terms, particularly within the presenting complaint field of the ED sample. ConclusionClix Enrich can be used to accurately extract SNOMED concepts from clinical letters. The resulting datasets are readily available to link to existing EHRs, and can be linked to EHRs that adopt the SNOMED coding structure, or backward compatible hierarchies. Clix Enrich comes with out-of-the-box extraction methods but the optimum way to extract the correct information would be to build in custom queries, thus requiring clinical expertise to validate extraction.

Highlights

The Swansea Collaborative in Analysis of NLP Research (SCANR) group at Swansea University has been established to evaluate the usage of Natural Language Processing platforms for obtaining new clinical data
The resulting datasets are readily available to link to existing Electronic healthcare records (EHR), and can be linked to EHRs that adopt the SNOMED coding structure, or backward compatible hierarchies
Clix Enrich comes with out-of-the-box extraction methods but the optimum way to extract the correct information would be to build in custom queries, requiring clinical expertise to validate extraction

Summary

Introduction

Codifying unstructured data: A Natural Language Processing approach to extract rich data from clinical letters Arron1*, Lyons, Jane1, Akbari, Ashley1, Turner, Samantha L1, Walters, Angharad M1, Fonferko-Shadrach, Beata1, Pickrell, Owen1, Rees, Mark I1, Lyons, Ronan A1, Ford, David V1, and Middleton, Rod M1 Electronic healthcare records (EHR) are the main data sources that facilitate epidemiology research.

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Population Data Science	Publication Date: Apr 19, 2017
Citations: 2	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Codifying unstructured data: A Natural Language Processing approach to extract rich data from clinical letters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Population Data Science

Lead the way for us

Similar Papers

Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system
Beata Fonferko-Shadrach ... Ronan A Lyons
BMJ Open | VOL. 9
Beata Fonferko-Shadrach, et. al.Beata Fonferko-Shadrach ... Ronan A Lyons
01 Apr 2019
BMJ Open | VOL. 9

Determining Intravenous rt-PA Eligibility in the Emergency Department
Amy C Mecozzi ... Lewis B Morgenstern
Neurocritical Care | VOL. 7
Amy C Mecozzi, et. al.Amy C Mecozzi ... Lewis B Morgenstern
01 Sep 2007
Neurocritical Care | VOL. 7

Does the @home team reduce local Emergency Department attendances? The experience of one London service
Nicola Pickstone ... Geraldine A Lee
International Emergency Nursing | VOL. 46
Nicola Pickstone, et. al.Nicola Pickstone ... Geraldine A Lee
23 May 2019
International Emergency Nursing | VOL. 46

What is the evidence for the management of patients along the pathway from the emergency department to acute admission to reduce unplanned attendance and admission? An evidence synthesis
Sarah H Credé ... Susan J Croft
BMC Health Services Research | VOL. 17
Sarah H Credé, et. al.Sarah H Credé ... Susan J Croft
16 May 2017
BMC Health Services Research | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Codifying unstructured data: A Natural Language Processing approach to extract rich data from clinical letters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Population Data Science