Abstract

In a variety of contexts, randomization is regarded as an effective technique to conceal sensitive information. Viewing randomization mechanisms as information-theoretic channels, we start from a semantic notion of security, which expresses absence of any privacy breach above a given level of seriousness ϵ, irrespective of any background information, represented as a prior probability on the secret inputs. We first examine this notion according to two dimensions: worst vs. average case, single vs. repeated observations. In each case, we characterize the security level achievable by a mechanism in a simple fashion that only depends on the channel matrix, and specifically on certain measures of “distance” between its rows, like norm-1 distance and Chernoff Information. We next clarify the relation between our worst-case security notion and differential privacy (dp): we show that, while the former is in general stronger, the two coincide if one confines to background information that can be factorized into the product of independent priors over individuals. We finally turn our attention to expected utility, in the sense of Ghosh et al., in the case of repeated independent observations. We characterize the exponential growth rate of any reasonable utility function. In the particular case the mechanism provides ϵ-dp, we study the relation of the utility rate with ϵ: we offer either exact expressions or upper-bounds for utility rate that apply to practically interesting cases, such as the (truncated) geometric mechanism.

Highlights

  • In a variety of contexts, randomization is regarded as an effective means to conceal sensitive information

  • Anonymity protocols like Crowds [24] or the Dining Cryptographers [11] rely on randomization to “confound” the adversary as to the true actions undertaken by each participant

  • In the field of Data Mining, techniques have been proposed by which datasets containing personal information that are released for business or research purposes are perturbed with noise, so as to prevent an adversary from re-identifying individuals or learning sensitive information about them

Read more

Summary

Introduction

In a variety of contexts, randomization is regarded as an effective means to conceal sensitive information. An online, randomized data-releasing mechanism might offer users the possibility of asking the same query a number of times This allows the user to compute more accurate answers, and poses potential security threats, as an adversary could remove enough noise to learn valuable information about the secret. – In the scenario of a single observation, both in the average and in the worst case, we characterize the security level (absence of breach above a certain threshold) of the randomization mechanism in a simple way that only depends on certain rowdistance measures of the underlying matrix. We propose and characterize both worst- and average-case semantic notions of privacy breach, encoding resistance to arbitrary side-information, and clarify their relationships with qif and dp. Due to space limitations proofs have been omitted; they can be found in a full version available online [7]

Basic terminology
Asymptotic behavior
Semantic security of randomization mechanisms
The worst-case scenario
The average-case scenario
Asymptotic security
Worst-case scenario
Average-case scenario
Utility
Conclusion and further work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.