Abstract
AbstractIn the past decade, differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analysing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation of differential privacy, which we term ‘f-differential privacy’ (f-DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, f-DP faithfully preserves the hypothesis testing interpretation of differential privacy, thereby making the privacy guarantees easily interpretable. In addition, f-DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for the original differential privacy definition to f-DP and, as an application of this technique, obtain a simple and easy-to-interpret theorem of privacy amplification by subsampling for f-DP. In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the f-DP class that is referred to as ‘Gaussian differential privacy’ (GDP), defined based on hypothesis testing of two shifted Gaussian distributions. GDP is the focal privacy definition among the family of f-DP guarantees due to a central limit theorem for differential privacy that we prove. More precisely, the privacy guarantees of any hypothesis testing based definition of privacy (including the original differential privacy definition) converges to GDP in the limit under composition. We also prove a Berry–Esseen style version of the central limit theorem, which gives a computationally inexpensive tool for tractably analysing the exact composition of private algorithms. Taken together, this collection of attractive properties render f-DP a mathematically coherent, analytically tractable and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent.
Highlights
In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the f -differential privacy” (f -differentially private (DP)) class that is referred to as “Gaussian differential privacy” (GDP), defined based on hypothesis testing of two shifted Gaussian distributions
We show that our privacy definition is closed and tight under composition, which means that the trade-off between type I and type II errors that results from the composition of an f1-DP mechanism with an f2-DP mechanism can always be exactly described by a certain function f
We have introduced a new framework for private data analysis that we refer to as f -differential privacy, which generalizes (ε, δ)-DP and has a number of attractive properties that escape the difficulties of prior work
Summary
Modern statistical analysis and machine learning are overwhelmingly applied to data concerning people. It would be near impossible to develop complex differentially private data analysis methods It has been known since the original papers defining differential privacy [DMNS06, DKM+06] that the composition of an (ε1, δ1)-DP mechanism and an (ε2, δ2)-DP mechanism yields an (ε1 + ε2, δ1 + δ2)DP mechanism, the corresponding upper bound eε1+ε2α + δ1 + δ2 on the power of any test at significance level α no longer tightly characterizes the trade-off between significance level and power for the testing between S and S. Certain simple and fundamental primitives associated with differential privacy—most notably, privacy amplification by subsampling [KLN+11]—either fail to apply to the existing relaxations of differential privacy, or require a substantially complex analysis [WBK18] This is especially problematic when analyzing privacy guarantees of stochastic gradient descent—arguably the most popular present-day optimization algorithm—as subsampling is inherent to this algorithm. It necessitated Abadi et al [ACG+16] to develop the numerical moments accountant method to sidestep the issue
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of the Royal Statistical Society Series B: Statistical Methodology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.