Abstract

A common goal of privacy research is to release synthetic data that satisfies a formal privacy guarantee and can be used by an analyst in place of the original data. To achieve reasonable accuracy, a synthetic data set must be tuned to support a specified set of queries accurately, sacrificing fidelity for other queries. This work considers methods for producing synthetic data under differential privacy and investigates what makes a set of queries "easy" or "hard" to answer. We consider this issue in the particular case of answering sets of linear counting queries using the matrix mechanism (Li et al. 2010), a recent differentially-private mechanism that can reduce error by adding complex correlated noise adapted to a specified workload. Our main result is a novel lower bound on the minimum total error required to simultaneously release answers to a set of workload queries when using the matrix mechanism. The bound reveals that the hardness of a query workload is related to the spectral properties of the workload when it is represented in matrix form. Under (??, ?)-differential privacy, we prove that this bound is tight for many common workloads such as the set of all predicate queries and the set of all k-way marginals. Our empirical study also indicates this bound is close-to-tight on workloads consisting of random interval queries or random marginals.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call