Abstract

AbstractIn the past decade, differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analysing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation of differential privacy, which we term ‘f-differential privacy’ (f-DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, f-DP faithfully preserves the hypothesis testing interpretation of differential privacy, thereby making the privacy guarantees easily interpretable. In addition, f-DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for the original differential privacy definition to f-DP and, as an application of this technique, obtain a simple and easy-to-interpret theorem of privacy amplification by subsampling for f-DP. In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the f-DP class that is referred to as ‘Gaussian differential privacy’ (GDP), defined based on hypothesis testing of two shifted Gaussian distributions. GDP is the focal privacy definition among the family of f-DP guarantees due to a central limit theorem for differential privacy that we prove. More precisely, the privacy guarantees of any hypothesis testing based definition of privacy (including the original differential privacy definition) converges to GDP in the limit under composition. We also prove a Berry–Esseen style version of the central limit theorem, which gives a computationally inexpensive tool for tractably analysing the exact composition of private algorithms. Taken together, this collection of attractive properties render f-DP a mathematically coherent, analytically tractable and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent.

Highlights

  • In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the f -differential privacy” (f -differentially private (DP)) class that is referred to as “Gaussian differential privacy” (GDP), defined based on hypothesis testing of two shifted Gaussian distributions

  • We show that our privacy definition is closed and tight under composition, which means that the trade-off between type I and type II errors that results from the composition of an f1-DP mechanism with an f2-DP mechanism can always be exactly described by a certain function f

  • We have introduced a new framework for private data analysis that we refer to as f -differential privacy, which generalizes (ε, δ)-DP and has a number of attractive properties that escape the difficulties of prior work

Read more

Summary

Introduction

Modern statistical analysis and machine learning are overwhelmingly applied to data concerning people. It would be near impossible to develop complex differentially private data analysis methods It has been known since the original papers defining differential privacy [DMNS06, DKM+06] that the composition of an (ε1, δ1)-DP mechanism and an (ε2, δ2)-DP mechanism yields an (ε1 + ε2, δ1 + δ2)DP mechanism, the corresponding upper bound eε1+ε2α + δ1 + δ2 on the power of any test at significance level α no longer tightly characterizes the trade-off between significance level and power for the testing between S and S. Certain simple and fundamental primitives associated with differential privacy—most notably, privacy amplification by subsampling [KLN+11]—either fail to apply to the existing relaxations of differential privacy, or require a substantially complex analysis [WBK18] This is especially problematic when analyzing privacy guarantees of stochastic gradient descent—arguably the most popular present-day optimization algorithm—as subsampling is inherent to this algorithm. It necessitated Abadi et al [ACG+16] to develop the numerical moments accountant method to sidestep the issue

Our Contributions
Trade-off Functions and f -DP
Gaussian Differential Privacy
Post-Processing and the Informativeness of f -DP
A Primal-Dual Perspective
Group Privacy
Composition and Limit Theorems
A General Composition Theorem
Central Limit Theorems for Composition
Composition of (ε, δ)-DP: Beating Berry–Esseen
Amplifying Privacy by Subsampling
A Subsampling Theorem
Proof of the Subsampling Theorem
Application
Stochastic Gradient Descent and Its Privacy Analysis
3: Subsampling
Asymptotic Privacy Analysis
A Berry–Esseen Privacy Bound
Discussion
B Conversion from f -DP to Divergence Based DP
C A Self-contained Proof of the Composition Theorem
D Omitted Proofs in Section 3
Commutativity
F Omitted Details in Section 4
Findings
G Omitted Proofs in Section 5
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call