A study of deep learning methods for de-identification of clinical notes in cross-institute settings

Xi Yang,Qian Li,Chih-Yin Lee,Jiang Bian,Yonghui Wu,Tianchen Lyu,William R Hogan

doi:10.1186/s12911-019-0935-4

Xi Yang, Qian Li + Show 5 more

Open Access

https://doi.org/10.1186/s12911-019-0935-4

Copy DOI

Abstract

BackgroundDe-identification is a critical technology to facilitate the use of unstructured clinical text while protecting patient privacy and confidentiality. The clinical natural language processing (NLP) community has invested great efforts in developing methods and corpora for de-identification of clinical notes. These annotated corpora are valuable resources for developing automated systems to de-identify clinical text at local hospitals. However, existing studies often utilized training and test data collected from the same institution. There are few studies to explore automated de-identification under cross-institute settings. The goal of this study is to examine deep learning-based de-identification methods at a cross-institute setting, identify the bottlenecks, and provide potential solutions.MethodsWe created a de-identification corpus using a total 500 clinical notes from the University of Florida (UF) Health, developed deep learning-based de-identification models using 2014 i2b2/UTHealth corpus, and evaluated the performance using UF corpus. We compared five different word embeddings trained from the general English text, clinical text, and biomedical literature, explored lexical and linguistic features, and compared two strategies to customize the deep learning models using UF notes and resources.ResultsPre-trained word embeddings using a general English corpus achieved better performance than embeddings from de-identified clinical text and biomedical literature. The performance of deep learning models trained using only i2b2 corpus significantly dropped (strict and relax F1 scores dropped from 0.9547 and 0.9646 to 0.8568 and 0.8958) when applied to another corpus annotated at UF Health. Linguistic features could further improve the performance of de-identification in cross-institute settings. After customizing the models using UF notes and resource, the best model achieved the strict and relaxed F1 scores of 0.9288 and 0.9584, respectively.ConclusionsIt is necessary to customize de-identification models using local clinical text and other resources when applied in cross-institute settings. Fine-tuning is a potential solution to re-use pre-trained parameters and reduce the training time to customize deep learning-based de-identification models trained using clinical corpus from a different institution.

Highlights

De-identification is a critical technology to facilitate the use of unstructured clinical text while protecting patient privacy and confidentiality
It is necessary to customize de-identification models using local clinical text and other resources when applied in cross-institute settings
We developed the de-identification models using a clinical corpus developed by the 2014 i2b2/UTHealth challenge and evaluated the performance using clinical notes collected from University of Florida (UF) Health

Summary

Introduction

De-identification is a critical technology to facilitate the use of unstructured clinical text while protecting patient privacy and confidentiality. The clinical natural language processing (NLP) community has invested great efforts in developing methods and corpora for de-identification of clinical notes. These annotated corpora are valuable resources for developing automated systems to de-identify clinical text at local hospitals. As manually deidentification is often time consuming and not applicable to large volumes of clinical text, researchers have developed natural language processing (NLP) methods to automatically identify and remove PHIs from clinical notes [4, 5]. Several deidentification corpora have been annotated to support the training of supervised machine learning methods [7,8,9,10] These annotated corpora are valuable resource to develop automated clinical NLP systems for deidentification of clinical text at local hospitals. There is limited study to explore automated de-identification of clinical notes under cross-institute settings [11,12,13]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC medical informatics and decision making	Publication Date: Dec 1, 2019
Citations: 53	License type: open-access

R Discovery Prime

R Discovery Prime

A study of deep learning methods for de-identification of clinical notes in cross-institute settings

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC medical informatics and decision making

Lead the way for us

Similar Papers

A case study on decompounding in Indian language IR
Siba Sankar Sahu ... Sukomal Pal
Natural Language Processing | VOL. -
Siba Sankar Sahu, et. al.Siba Sankar Sahu ... Sukomal Pal
03 Jun 2024
Natural Language Processing | VOL. -

Deep learning–based radiomic nomograms for predicting Ki67 expression in prostate cancer
Shuitang Deng ... Jingfeng Ding
BMC Cancer | VOL. 23
Shuitang Deng, et. al.Shuitang Deng ... Jingfeng Ding
08 Jul 2023
BMC Cancer | VOL. 23

Mortality prediction on unsupervised and semi-supervised clusters of medical intensive care unit patients based on MIMIC-II database
M.K Lintu ... Asha Kamath
Informatics in medicine unlocked | VOL. 39
M.K Lintu, et. al.M.K Lintu ... Asha Kamath
01 Jan 2023
Informatics in medicine unlocked | VOL. 39

Deep chemometrics: Validation and transfer of a global deep near‐infrared fruit model to use it on a new portable instrument
Puneet Mishra ... Dário Passos
Journal of chemometrics | VOL. 35
Puneet Mishra, et. al.Puneet Mishra ... Dário Passos
21 Jul 2021
Journal of chemometrics | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A study of deep learning methods for de-identification of clinical notes in cross-institute settings

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC medical informatics and decision making