Abstract

We consider non-parametric density estimation in the framework of local, both pure and approximate, differential privacy. In contrast to centralized privacy scenarios with a trusted curator, in the local setup anonymization must be guaranteed already on the individual data owners’ side and must therefore precede any data mining tasks. Thus, the published anonymized data should be compatible with as many statistical procedures as possible. We consider different mechanisms to establish pure and approximate differential privacy, respectively. We obtain minimax type results over Sobolev classes indexed by a smoothness parameter s>1∕2 for the mean squared error at a fixed point. In particular, we show that appropriately defined kernel density estimators can attain the optimal rate of convergence if the bandwidth parameter is correctly specified. Notably, the optimal convergence rate in terms of the sample size n is n−(2s−1)∕(2s+1) under pure differential privacy and thus deteriorated to the rate n−(2s−1)∕(2s) which holds both without privacy restrictions and under approximate differential privacy. Since the optimal choice of the bandwidth parameter depends on the smoothness s and is thus not accessible in practise, adaptive methods for bandwidth selection are necessary and must, in the local privacy framework, be performed based on the anonymized data only. We address this problem by means of variants of Lepski’s method tailored to the privacy setups at hand and obtain general oracle inequalities for private kernel density estimators. In the Sobolev case, the resulting adaptive estimators attain the optimal rates of convergence at least up to logarithmic factors. On the side, we discuss some critical issues related with the notion of approximate differential privacy.

Highlights

  • In the modern information era data are routinely collected in all areas of private and public life

  • Bandwidth selection is usually a delicate issue and so it is in our local privacy setup

  • We have investigated the optimal rates of convergence for pointwise estimation of a probability density over Sobolev classes for both pure and approximate local differential privacy

Read more

Summary

Introduction

In the modern information era data are routinely collected in all areas of private and public life. In order to perform the Lepski scheme here, any data owner has to publish the kernel density estimator for one single bandwidth but for a finite set of potential bandwidths Such a multiple output still guarantees the desired privacy condition provided that the additive noise is multiplied with a factor proportional to the number of potential bandwidths which is logarithmic in the number of data sources in our case. Note that this issue arises in the local privacy setup since in the global framework the trusted curator can apply the existing plethora of methods for bandwidth selection on the unmasked data, and only publish the resulting estimator with the adaptively determined bandwidth in its anonymized form. For the specific example of Sobolev ellipsoids, the rates of convergence are merely deteriorated by logarithmic factors with respect to the case of a priori known smoothness

Definition of approximate differential privacy
Pure differential privacy by adding Laplace noise
Approximate differential privacy by random replacement
A composition lemma for approximate differential privacy
Private minimax estimation
Upper bound
Adaptation to unknown smoothness
Adaptive estimation for Laplace perturbation
Adaptive estimation for approximate differential privacy
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call