Evaluation of Automated Public De-Identification Tools on a Corpus of Radiology Reports.

Jackson M Steinkamp,Tessa S Cook,Taylor Pomeranz,Jason Adleberg,Charles E Kahn

doi:10.1148/ryai.2020190137

Abstract

To evaluate publicly available de-identification tools on a large corpus of narrative-text radiology reports. In this retrospective study, 21 categories of protected health information (PHI) in 2503 radiology reports were annotated from a large multihospital academic health system, collected between January 1, 2012 and January 8, 2019. A subset consisting of 1023 reports served as a test set; the remainder were used as domain-specific training data. The types and frequencies of PHI present within the reports were tallied. Five public de-identification tools were evaluated: MITRE Identification Scrubber Toolkit, U.S. National Library of Medicine‒Scrubber, Massachusetts Institute of Technology de-identification software, Emory Health Information DE-identification (HIDE) software, and Neuro named-entity recognition (NeuroNER). The tools were compared using metrics including recall, precision, and F1 score (the harmonic mean of recall and precision) for each category of PHI. The annotators identified 3528 spans of PHI text within the 2503 reports. Cohen κ for interrater agreement was 0.938. Dates accounted for the majority of PHI found in the dataset of radiology reports (n = 2755 [78%]). The two best-performing tools both used machine learning methods-NeuroNER (precision, 94.5%; recall, 92.6%; microaveraged F1 score [F1], 93.6%) and Emory HIDE (precision, 96.6%; recall, 88.2%; F1, 92.2%)-but none exceeded 50% F1 on the important patient names category. PHI appeared infrequently within the corpus of reports studied, which created difficulties for training machine learning systems. Out-of-the-box de-identification tools achieved limited performance on the corpus of radiology reports, suggesting the need for further advancements in public datasets and trained models.Supplemental material is available for this article.See also the commentary by Tenenholtz and Wood in this issue.© RSNA, 2020.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Evaluation of Automated Public De-Identification Tools on a Corpus of Radiology Reports.

Abstract

Talk to us

Similar Papers

More From: Radiology. Artificial intelligence

Lead the way for us

Journal: Radiology. Artificial intelligence	Publication Date: Oct 14, 2020
Citations: 12

Similar Papers

Automated deidentification of radiology reports combining transformer and "hide in plain sight" rule-based methods.
Pierre J Chambon ... Curtis P Langlotz
Journal of the American Medical Informatics Association : JAMIA | VOL. 30
Pierre J Chambon, et. al.Pierre J Chambon ... Curtis P Langlotz
23 Nov 2022
Journal of the American Medical Informatics Association : JAMIA | VOL. 30

Ensemble Approaches to Recognize Protected Health Information in Radiology Reports.
Hannah Horng ... Tessa S Cook
Journal of digital imaging | VOL. 35
Hannah Horng, et. al.Hannah Horng ... Tessa S Cook
17 Jun 2022
Journal of digital imaging | VOL. 35

Automated de-identification of free-text medical records
Ishna Neamatullah ... Gari D Clifford
BMC Medical Informatics and Decision Making | VOL. 8
Ishna Neamatullah, et. al.Ishna Neamatullah ... Gari D Clifford
24 Jul 2008
BMC Medical Informatics and Decision Making | VOL. 8

Advanced Technology and Confidentiality in Hand Surgery
Nash H Naam ... Sandy Sanbar
Journal of Hand Surgery | VOL. 40
Nash H Naam, et. al.Nash H Naam ... Sandy Sanbar
01 Sep 2014
Journal of Hand Surgery | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluation of Automated Public De-Identification Tools on a Corpus of Radiology Reports.

Abstract

Talk to us

Similar Papers

More From: Radiology. Artificial intelligence