Anonymization Framework for Securing Protected Health Information in a Complex Dataset of Medical Narratives

Saman Hina,Syed Abbas Ali,Raheela Asif

doi:10.22581/muet1982.2003.16

Abstract

It is imperative in a medical domain that protection of information does not allow an individual to be overlooked. In medical domain, research community encourages use of real-time datasets for research purposes. These real-time datasets contain structured and unstructured (natural language free text) information that can be useful to researchers in various disciplines including computational linguistics. On the other hand, these real-time datasets cannot be distributed without anonymization of Protected Health Information (PHI). The information of PHI (such as Name, age, address, etc.) that can identify an individual is unethical. Therefore, we present a rule-based Natural Language Processing (NLP) anonymization system using a challenging corpus containing medical narratives and ICD-10 codes (medical codes). This anonymization module can be used for pre-processing the corpus containing identifiable information. The corpus used in this research contains '2534' PHIs in '1984' medical records in total. 15% of the labelled corpus was used for improvement of guidelines in the identification and classification of PHI groups and 85% was held for the evaluation. Our anonymization system follows two step process: (1) Identification and cataloging PHIs with four PHI categories ('Patients Name', 'Doctors Name', 'Other Name [Names other than patients and doctors]', 'Place Name'), (2) Anonymization of PHIs by replacing identified PHIs with their respective PHI categories. Our method uses basic language processing, dictionaries, rules and heuristics to identify, classify and anonymize PHIs with PHI categories. We use standard metrics for evaluation and our system outperforms against human annotated gold standard with 100% of F-measure by increasing 39% from baseline results, which proves the reliability of data usage for research.

Highlights

IntroductionResearchers are keen to use realtime data instead of fictional data
In medical domain, researchers are keen to use realtime data instead of fictional data
Anonymization Framework for Securing Protected Health Information in a Complex Dataset of Medical Narratives particular user agreement based on ethical requirements

Summary

Introduction

Researchers are keen to use realtime data instead of fictional data. The people responsible for research distribute datasets after the approval of a. Anonymization Framework for Securing Protected Health Information in a Complex Dataset of Medical Narratives particular user agreement based on ethical requirements. These datasets are developed for specific research tasks and may not be reused for other research problems. Researchers face unavailability of real-time datasets or have option of developing their own dataset which is time-consuming

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mehran University Research Journal of Engineering and Technology	Publication Date: Jul 1, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Anonymization Framework for Securing Protected Health Information in a Complex Dataset of Medical Narratives

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mehran University Research Journal of Engineering and Technology

Lead the way for us

Similar Papers

Protection of digital health information: Examining guidance from the physician regulatory colleges in Canada.
Neil G Barr ... Glen E Randall
Health Information Management Journal | VOL. 50
Neil G Barr, et. al.Neil G Barr ... Glen E Randall
11 Sep 2019
Health Information Management Journal | VOL. 50

Cloudy.
Laurance Jerrold
American journal of orthodontics and dentofacial orthopedics : official publication of the American Association of Orthodontists, its constituent societies, and the American Board of Orthodontics | VOL. 151
Laurance JerroldLaurance Jerrold
01 Mar 2017
01 Mar 2017

Advanced Technology and Confidentiality in Hand Surgery
Nash H Naam ... Sandy Sanbar
Journal of Hand Surgery | VOL. 40
Nash H Naam, et. al.Nash H Naam ... Sandy Sanbar
01 Sep 2014
Journal of Hand Surgery | VOL. 40

Comparison of knowledge, attitudes, and trust for the use of personal health information in clinical research
Mi Jung Rho ... Kyung-Yong Chung
Multimedia Tools and Applications | VOL. 74
Mi Jung Rho, et. al.Mi Jung Rho ... Kyung-Yong Chung
22 Nov 2013
Multimedia Tools and Applications | VOL. 74

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Anonymization Framework for Securing Protected Health Information in a Complex Dataset of Medical Narratives

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mehran University Research Journal of Engineering and Technology