Abstract

BackgroundDe-identification is a common way to protect patient privacy when disclosing clinical data for secondary purposes, such as research. One type of attack that de-identification protects against is linking the disclosed patient data with public and semi-public registries. Uniqueness is a commonly used measure of re-identification risk under this attack. If uniqueness can be measured accurately then the risk from this kind of attack can be managed. In practice, it is often not possible to measure uniqueness directly, therefore it must be estimated.MethodsWe evaluated the accuracy of uniqueness estimators on clinically relevant data sets. Four candidate estimators were identified because they were evaluated in the past and found to have good accuracy or because they were new and not evaluated comparatively before: the Zayatz estimator, slide negative binomial estimator, Pitman’s estimator, and mu-argus. A Monte Carlo simulation was performed to evaluate the uniqueness estimators on six clinically relevant data sets. We varied the sampling fraction and the uniqueness in the population (the value being estimated). The median relative error and inter-quartile range of the uniqueness estimates was measured across 1000 runs.ResultsThere was no single estimator that performed well across all of the conditions. We developed a decision rule which selected between the Pitman, slide negative binomial and Zayatz estimators depending on the sampling fraction and the difference between estimates. This decision rule had the best consistent median relative error across multiple conditions and data sets.ConclusionThis study identified an accurate decision rule that can be used by health privacy researchers and disclosure control professionals to estimate uniqueness in clinical data sets. The decision rule provides a reliable way to measure re-identification risk.

Highlights

  • De-identification is a common way to protect patient privacy when disclosing clinical data for secondary purposes, such as research

  • US studies have shown that attitudes toward privacy and confidentiality of the census are predictive of people’s participation [3,4], and that there is a positive association between belief in the confidentiality of census records and the level of trust one has in the government [5]

  • The number of records affected by breaches is already quite high: the U.S Department of Health and Human Services (HHS) has reported 252 breaches at health information custodians each involving more than 500 records from the end of September 2009 to the end of 2010 [9]

Read more

Summary

Introduction

De-identification is a common way to protect patient privacy when disclosing clinical data for secondary purposes, such as research. A number of US studies have shown that attitudes toward privacy and confidentiality of the census are predictive of people’s participation [3,4], and that there is a positive association between belief in the confidentiality of census records and the level of trust one has in the government [5]. These trust effects are amplified when the information collected is of a sensitive nature [5,6]. At the same time there is increasing pressure to make individual-level health data more generally available, and in some cases publicly available, for research and policy purposes [10,11,12,13,14,15,16,17,18,19,20,21,22,23]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.