Publishing set-valued data via differential privacy

Rui Chen,Bipin C Desai,Benjamin C M Fung,Li Xiong,Noman Mohammed

doi:10.14778/3402707.3402744

Abstract

Set-valued data provides enormous opportunities for various data mining tasks. In this paper, we study the problem of publishing set-valued data for data mining tasks under the rigorous differential privacy model. All existing data publishing methods for set-valued data are based on partition-based privacy models, for example k -anonymity, which are vulnerable to privacy attacks based on background knowledge. In contrast, differential privacy provides strong privacy guarantees independent of an adversary's background knowledge and computational power. Existing data publishing approaches for differential privacy, however, are not adequate in terms of both utility and scalability in the context of set-valued data due to its high dimensionality. We demonstrate that set-valued data could be efficiently released under differential privacy with guaranteed utility with the help of context-free taxonomy trees. We propose a probabilistic top-down partitioning algorithm to generate a differentially private release, which scales linearly with the input data size. We also discuss the applicability of our idea to the context of relational data. We prove that our result is (∈, δ)-useful for the class of counting queries, the foundation of many data mining tasks. We show that our approach maintains high utility for counting queries and frequent itemset mining and scales to large datasets through extensive experiments on real-life set-valued datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Publishing set-valued data via differential privacy

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Aug 1, 2011
Citations: 219

Similar Papers

Privacy-preserving heterogeneous health data sharing
Noman Mohammed ... Benjamin C M Fung
Journal of the American Medical Informatics Association | VOL. 20
Noman Mohammed, et. al.Noman Mohammed ... Benjamin C M Fung
13 Dec 2012
Journal of the American Medical Informatics Association | VOL. 20

Modeling and Integrating Background Knowledge in Data Anonymization
Tiancheng Li ... Jian Zhang
-
Tiancheng Li, et. al.Tiancheng Li ... Jian Zhang
01 Mar 2009
01 Mar 2009

Publishing set valued data via m-privacy
Pratik Kumar Tiwari ... Sushil Chaturvedi
-
Pratik Kumar Tiwari, et. al.Pratik Kumar Tiwari ... Sushil Chaturvedi
01 Aug 2014
01 Aug 2014

Challenges of Differentially Private Release of Data Under an Open-world Assumption
Elham Naghizade ... Lars Kulik
-
Elham Naghizade, et. al.Elham Naghizade ... Lars Kulik
27 Jun 2017
27 Jun 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Publishing set-valued data via differential privacy

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment