Abstract

Differential privacy is a paradigm of big data privacy protection that offers protection even when an attacker has arbitrary background knowledge in advance. Consequently, it is viewed as a reliable protection mechanism for sensitive information. Differential privacy introduces noise, such as Laplace noise, to obfuscate the true values in a data set while preserving its statistical properties. However, a large amount of Laplace noise added into a data set is typically defined by the discursive scale parameter of Laplace distribution. The privacy budget $\varepsilon$ ɛ in differential privacy has been theoretically interpreted, but the implication on the risk of data disclosure ( RoD ) in practice has not yet been well studied. Moreover, choosing an appropriate value for $\varepsilon$ ɛ is not straightforward because it considerably affects the level of privacy in a data set. In this paper, we define and evaluate RoD in a data set with either numerical or binary attributes for numerical or counting queries with multiple attributes based on noise estimation. Through confidence probability of noise estimation, we provide a simple method to select the privacy budget $\varepsilon$ ɛ for differential privacy and associate differential privacy with $k$ k -anonymization. Finally, we show the relationship between the RoD and $\varepsilon$ ɛ as well as between $\varepsilon$ ɛ and $k$ k in our experimental results. To the best of our knowledge, this is the first study using the quantity of noise as a bridge to evaluate RoD for multiple attributes (either numerical or binary data) and determine the relationship between differential privacy and $k$ k -anonymization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call