Abstract

In this paper we revisit the classical problem of nonparametric regression, but impose local differential privacy constraints. Under such constraints, the raw data (X1,Y1),...,(Xn,Yn), taking values in Rd×R, cannot be directly observed, and all estimators are functions of the randomised output from a suitable privacy mechanism. The statistician is free to choose the form of the privacy mechanism, and here we add Laplace distributed noise to a discretisation of the location of a feature vector Xi and to the value of its response variable Yi. Based on this randomised data, we design a novel estimator of the regression function, which can be viewed as a privatised version of the well-studied partitioning regression estimator. The main result is that the estimator is strongly universally consistent, and we further establish an upper bound on the rate of convergence. Our methods and analysis also give rise to a strongly universally consistent binary classification rule for locally differentially private data.

Highlights

  • In recent years there has been a surge of interest in data analysis methodology that is able to achieve strong statistical performance without comprimising the privacy and security of individual data holders

  • The concept of differential privacy [15] was introduced to provide a rigorous notion of the amount of private information on individuals published statistics contain. Statistical treatments of this framework include [36, 23, 2, 6]. It is a suitable constraint for many problems, procedures that are differentially private often require the presence of a third party, who may be trusted to handle the raw data before statistics are published

  • The local differential privacy constraint [see, for example, 21, 12, and the references therein] was introduced to provide a setting where analysis must be carried out in such a way that each raw data point is only ever seen by the original data holder

Read more

Summary

Introduction

In recent years there has been a surge of interest in data analysis methodology that is able to achieve strong statistical performance without comprimising the privacy and security of individual data holders. It is a suitable constraint for many problems, procedures that are differentially private often require the presence of a third party, who may be trusted to handle the raw data before statistics are published To address this shortcoming, the local differential privacy constraint [see, for example, 21, 12, and the references therein] was introduced to provide a setting where analysis must be carried out in such a way that each raw data point is only ever seen by the original data holder. The problem of classification is strictly easier than regression, our methods and analysis give rise to a strongly universally consistent binary classification rule for locally differentially private data.

Preliminaries
Our regression estimation method and its strong universal consistency
Local differential privacy
Consequences in classification
Proofs and auxiliary results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call