Abstract

BackgroundVarious methods based on k-anonymity have been proposed for publishing medical data while preserving privacy. However, the k-anonymity property assumes that adversaries possess fixed background knowledge. Although differential privacy overcomes this limitation, it is specialized for aggregated results. Thus, it is difficult to obtain high-quality microdata. To address this issue, we propose a differentially private medical microdata release method featuring high utility.MethodsWe propose a method of anonymizing medical data under differential privacy. To improve data utility, especially by preserving informative attribute values, the proposed method adopts three data perturbation approaches: (1) generalization, (2) suppression, and (3) insertion. The proposed method produces an anonymized dataset that is nearly optimal with regard to utility, while preserving privacy.ResultsThe proposed method achieves lower information loss than existing methods. Based on a real-world case study, we prove that the results of data analyses using the original dataset and those obtained using a dataset anonymized via the proposed method are considerably similar.ConclusionsWe propose a novel differentially private anonymization method that preserves informative values for the release of medical data. Through experiments, we show that the utility of medical data that has been anonymized via the proposed method is significantly better than that of existing methods.

Highlights

  • Various methods based on k-anonymity have been proposed for publishing medical data while preserving privacy

  • We propose a data anonymization method based on the differential privacy theory

  • To measure the information loss caused by generalization, we introduce the concept of the Normalized certainty penalty (NCP) (Normalized Certainty Penalty) [18]

Read more

Summary

Methods

Problem settings Consider that a data holder possesses a dataset D that contains multi-dimensional records, and each record belongs to a unique individual. In IPA, we allocate the privacy budget over four different parts, i.e., suppression threshold, number of counterfeit records, determining the informative attribute value of a counterfeit record, and choosing an anonymized dataset, which are proved by Theorems 5, 6, 7, and 8, respectively. Adding independently generated counterfeit records from the Laplace distribution Lap 1/ insertion to each equivalent class achieves insertion -differential privacy. Theorem 7 (Determining informative attribute values for inserted records based on Eq 4 achieves value -differential privacy.). Theorem (IPA achieves suppression + insertion + value + candidates -differential privacy.). We showed that each operation is differentially private on its own As these operations run on the same dataset, based on Theorem 3, IPA achieves suppression + insertion + value + candidates -differential privacy

Results
Conclusions
Background
Results and discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call