Evaluating Common De-Identification Heuristics for Personal Health Information

Khaled El Emam,Youenn Drouet,Michael Power,Sam Jabbouri,Scott Sams

doi:10.2196/jmir.8.4.e28

Abstract

Background With the growing adoption of electronic medical records, there are increasing demands for the use of this electronic clinical data in observational research. A frequent ethics board requirement for such secondary use of personal health information in observational research is that the data be de-identified. De-identification heuristics are provided in the Health Insurance Portability and Accountability Act Privacy Rule, funding agency and professional association privacy guidelines, and common practice.Objective The aim of the study was to evaluate whether the re-identification risks due to record linkage are sufficiently low when following common de-identification heuristics and whether the risk is stable across sample sizes and data sets.Methods Two methods were followed to construct identification data sets. Re-identification attacks were simulated on these. For each data set we varied the sample size down to 30 individuals, and for each sample size evaluated the risk of re-identification for all combinations of quasi-identifiers. The combinations of quasi-identifiers that were low risk more than 50% of the time were considered stable.Results The identification data sets we were able to construct were the list of all physicians and the list of all lawyers registered in Ontario, using 1% sampling fractions. The quasi-identifiers of region, gender, and year of birth were found to be low risk more than 50% of the time across both data sets. The combination of gender and region was also found to be low risk more than 50% of the time. We were not able to create an identification data set for the whole population.Conclusions Existing Canadian federal and provincial privacy laws help explain why it is difficult to create an identification data set for the whole population. That such examples of high re-identification risk exist for mainstream professions makes a strong case for not disclosing the high-risk variables and their combinations identified here. For professional subpopulations with published membership lists, many variables often needed by researchers would have to be excluded or generalized to ensure consistently low re-identification risk. Data custodians and researchers need to consider other statistical disclosure techniques for protecting privacy.

Highlights

The adoption of electronic medical records (EMRs) is growing [1,2,3,4,5]
In this paper we evaluate whether common de-identification heuristics ensure a low level of reidentification risk across different data sets and sample sizes
The common heuristics we evaluate are a union of a subset defined in the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule, currently practised in clinical research, and presented in privacy guidelines

Summary

Introduction

The adoption of electronic medical records (EMRs) is growing [1,2,3,4,5]. Researchers are increasingly turning to EMRs as a source of clinically relevant patient data. There are calls for EMRs to support secondary uses of this data for observational studies, such as epidemiologic and health services research [6]. With the growing adoption of electronic medical records, there are increasing demands for the use of this electronic clinical data in observational research. Conclusions: Existing Canadian federal and provincial privacy laws help explain why it is difficult to create an identification data set for the whole population. That such examples of high re-identification risk exist for mainstream professions makes a strong case for not disclosing the high-risk variables and their combinations identified here. Data custodians and researchers need to consider other statistical disclosure techniques for protecting privacy

Objectives

Methods

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Medical Internet Research	Publication Date: Nov 21, 2006
Citations: 68	License type: cc-by

R Discovery Prime

R Discovery Prime

Evaluating Common De-Identification Heuristics for Personal Health Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Medical Internet Research

Lead the way for us

Similar Papers

Methods for the de-identification of electronic health records for genomic research
Khaled El Emam
Genome medicine | VOL. 3
Khaled El EmamKhaled El Emam
01 Jan 2010
Genome medicine | VOL. 3

Comparison of knowledge, attitudes, and trust for the use of personal health information in clinical research
Mi Jung Rho ... Kyung-Yong Chung
Multimedia Tools and Applications | VOL. 74
Mi Jung Rho, et. al.Mi Jung Rho ... Kyung-Yong Chung
22 Nov 2013
Multimedia Tools and Applications | VOL. 74

The re-identification risk of Canadians from longitudinal demographics
Khaled El Emam ... Aman Verma
BMC medical informatics and decision making | VOL. 11
Khaled El Emam, et. al.Khaled El Emam ... Aman Verma
22 Jun 2011
BMC medical informatics and decision making | VOL. 11

Personal health information in research: Perceived risk, trustworthiness and opinions from patients attending a tertiary healthcare facility.
Michelle Krahe ... Sheena Reilly
Journal of Biomedical Informatics | VOL. 95
Michelle Krahe, et. al.Michelle Krahe ... Sheena Reilly
05 Jun 2019
Journal of Biomedical Informatics | VOL. 95

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evaluating Common De-Identification Heuristics for Personal Health Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Medical Internet Research