Abstract

Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. On the one hand, context-free privacy solutions, such as differential privacy, provide strong privacy guarantees, but often lead to a significant reduction in utility. On the other hand, context-aware privacy solutions, such as information theoretic privacy, achieve an improved privacy-utility tradeoff, but assume that the data holder has access to dataset statistics. We circumvent these limitations by introducing a novel context-aware privacy framework called generative adversarial privacy (GAP). GAP leverages recent advancements in generative adversarial networks (GANs) to allow the data holder to learn privatization schemes from the dataset itself. Under GAP, learning the privacy mechanism is formulated as a constrained minimax game between two players: a privatizer that sanitizes the dataset in a way that limits the risk of inference attacks on the individuals' private variables, and an adversary that tries to infer the private variables from the sanitized dataset. To evaluate GAP's performance, we investigate two simple (yet canonical) statistical dataset models: (a) the binary data model, and (b) the binary Gaussian mixture model. For both models, we derive game-theoretically optimal minimax privacy mechanisms, and show that the privacy mechanisms learned from data (in a generative adversarial fashion) match the theoretically optimal ones. This demonstrates that our framework can be easily applied in practice, even in the absence of dataset statistics.

Highlights

  • The explosion of information collection across a variety of electronic platforms is enabling the use of inferential machine learning (ML) and artificial intelligence to guide consumers through a myriad of choices and decisions in their daily lives

  • We focus our attention on two types of loss functions: (a) a 0-1 loss that leads to a maximum a posteriori probability (MAP) adversary; and (b) an empirical log-loss that leads to a minimum cross-entropy adversary

  • For the above-mentioned statistical dataset models, we present two approaches towards designing privacy mechanisms: (i) private-data dependent (PDD) mechanisms, where the privatizer uses both the public and private variables; and (ii) private-data independent (PDI) mechanisms, where the privatizer only uses the public variables

Read more

Summary

Introduction

The explosion of information collection across a variety of electronic platforms is enabling the use of inferential machine learning (ML) and artificial intelligence to guide consumers through a myriad of choices and decisions in their daily lives. Issa et al introduced maximal leakage (MaxL) to quantify leakage to a strong adversary capable of guessing any function of the dataset [55] They showed that their adversarial model can be generalized to encompass local DP (wherein the mechanism ensures limited distinction for any pair of entries—a stronger DP notion without a neighborhood constraint [27,56]) [57]. As privacy metrics requires learning the parameters of the privatization mechanism in a data-driven fashion that involves minimizing an empirical information theoretic loss function This task is remarkably challenging in practice [63,64,65,66,67]. An inherent challenge in taking a context-aware privacy approach is that it requires having access to priors, such as joint distributions of public and private variables Such information is hardly ever present in practice. Under GAP, the parameters of a generative model, representing the privatization mechanism, are learned from the data itself

Our Contributions
Related Work
Outline
Generative Adversarial Privacy Model
Formulation
GAP under Various Loss Functions
Data-Driven GAP
Our Focus
Binary Data Model
Theoretical Approach for Binary Data Model
PDD Privacy Mechanism
PDI Privacy Mechanism
Data-driven Approach for Binary Data Model
Illustration of Results
Binary Gaussian Mixture Model
Theoretical Approach for Binary Gaussian Mixture Model
PDI Gaussian Noise Adding Privacy Mechanism
PDD Gaussian Noise Adding Privacy Mechanism
Data-driven Approach for Binary Gaussian Mixture Model
Findings
Concluding Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call