Natural language generation for electronic health records

Scott H Lee

doi:10.1038/s41746-018-0070-0

Abstract

One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative electronic health records (EHRs) to facilitate data sharing between healthcare providers and researchers and promote methodological research. A variety of methods existing for generating synthetic EHRs, but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness, or progress notes. Here, we use the encoder–decoder model, a deep learning algorithm that features in many contemporary machine translation systems, to generate synthetic chief complaints from discrete variables in EHRs, like age group, gender, and discharge diagnosis. After being trained end-to-end on authentic records, the model can generate realistic chief complaint text that appears to preserve the epidemiological information encoded in the original record-sentence pairs. As a side effect of the model’s optimization goal, these synthetic chief complaints are also free of relatively uncommon abbreviation and misspellings, and they include none of the personally identifiable information (PII) that was in the training data, suggesting that this model may be used to support the de-identification of text in EHRs. When combined with algorithms like generative adversarial networks (GANs), our model could be used to generate fully-synthetic EHRs, allowing healthcare providers to share faithful representations of multimodal medical data without compromising patient privacy. This is an important advance that we hope will facilitate the development of machine-learning methods for clinical decision support, disease surveillance, and other data-hungry applications in biomedical informatics.

Highlights

The wide adoption of electronic health record (EHR) systems has led to the creation of large amounts of healthcare data
Because they contain personally identifiable patient information, much of which is protected under the Health Insurance Portability and Accountability Act (HIPAA), these data are often difficult for providers to share with investigators outside their organizations, limiting their feasibility for use in research
By training one neural network to generate fake records and another to discriminate those fakes from the real records, the model is able to learn the distribution of both count- and binary-valued variables in the EHRs, which can be used to produce patient-level records that preserve the analytic properties of the data without sacrificing patient privacy

Summary

Introduction

The wide adoption of electronic health record (EHR) systems has led to the creation of large amounts of healthcare data. We explore the use of encoder–decoder models, a kind of deep learning algorithm, to generate natural-language text for EHRs, filling an existing gap and increasing the feasibility of using generative models like Choi et al.’s GAN to create highquality healthcare datasets for secondary uses Like their name implies, encoder–decoder models comprise 2 (conventionally recurrent) neural networks: one that encodes the input sequence as a single dense vector, and one that decodes this vector into the target sequence.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: npj Digital Medicine	Publication Date: Nov 19, 2018
Citations: 49	License type: open-access

R Discovery Prime

R Discovery Prime

Natural language generation for electronic health records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: npj Digital Medicine

Lead the way for us

Similar Papers

77 Results of a Novel National Emergency Department Chief Complaint Database
J Seidenfeld ... A Vashi
Annals of Emergency Medicine | VOL. 80
J Seidenfeld, et. al.J Seidenfeld ... A Vashi
29 Sep 2022
Annals of Emergency Medicine | VOL. 80

Linking progression patterns in ischaemic heart disease to comorbidities and genetics by integrated analysis of electronic health records and population-wide registries
...
European heart journal | VOL. 41
, et. al. ...
01 Nov 2020
European heart journal | VOL. 41

Unified Medical Language System Coverage of Emergency-medicine Chief Complaints
Debbie A Travers ... Stephanie W Haas
Academic Emergency Medicine | VOL. 13
Debbie A Travers, et. al.Debbie A Travers ... Stephanie W Haas
01 Nov 2006
Academic Emergency Medicine | VOL. 13

Carrell et al. Respond to "Observational Research and the EHR"
D S Carrell ... D.-T Tran
American journal of hygiene | VOL. 179
D S Carrell, et. al.D S Carrell ... D.-T Tran
30 Jan 2014
American journal of hygiene | VOL. 179

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Natural language generation for electronic health records

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: npj Digital Medicine