Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Cynthia Dwork,Aleksandar Nikolov,Kunal Talwar

doi:10.1007/s00454-015-9678-x

Cynthia Dwork, Aleksandar Nikolov + Show 1 more

Open Access

https://doi.org/10.1007/s00454-015-9678-x

Copy DOI

Abstract

Differential privacy is a definition giving a strong privacy guarantee even in the presence of auxiliary information. In this work, we pursue the application of geometric techniques for achieving differential privacy, a highly promising line of work initiated by Hardt and Talwar (Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC'10, pp 705---714. ACM Press, New York, 2010). We apply these techniques to the problem of marginal release. Here, a database refers to a collection of the data of $$n$$n individuals, each characterized by $$d$$d binary attributes. A $$k$$k-way marginal query is specified by a subset $$S$$S of $$k$$k attributes, together with a $$|S|$$|S|-dimensional binary vector $$\beta $$β specifying their values. The true answer to this query is a count of the number of people in the database whose attribute vector restricted to $$S$$S agrees with $$\beta $$β. Information theoretically, the error complexity of marginal queries--how wrong do the answers have to be in order to preserve differential privacy--is well understood: the per-query additive error is known to be at least $$\varOmega (\min \{ \sqrt{n},d^{k/2}\})$$Ω(min{n,dk/2}) and at most $$\tilde{O}(\sqrt{n}d^{{\lceil k/2\rceil /4}})$$O~(nd?k/2?/4). However, no polynomial time algorithm with error complexity as low as the information-theoretic upper bound is known for small $$n$$n. We present a polynomial time algorithm that matches the best known information-theoretic bounds when $$k=2$$k=2; more generally, by reducing to the case $$k=2$$k=2, for any distribution on marginal queries, our algorithm achieves average error at most $$\tilde{O}(\sqrt{n}d^{{\lceil k/2\rceil /4}})$$O~(nd?k/2?/4), an improvement over previous work when $$k$$k is small and when error $$o(n)$$o(n) is desirable. Using private boosting, we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov et al. (Proceedings of the 45th Annual ACM Symposium on Theory of Computing, STOC'13, pp 351---360. ACM Press, New York, 2013), wherein a vector of sufficiently noisy answers is projected onto a particular convex body. We reduce the projection step, which is expensive, to a simple geometric question: given (a succinct representation of) a convex body $$K$$K, find a containing convex body $$L$$L that one can efficiently optimize over, while keeping the Gaussian width of $$L$$L small. This reduction is achieved by a careful use of the Frank---Wolfe algorithm.

Full Text