Abstract
In this paper we revisit the classical problem of nonparametric regression, but impose local differential privacy constraints. Under such constraints, the raw data (X1,Y1),...,(Xn,Yn), taking values in Rd×R, cannot be directly observed, and all estimators are functions of the randomised output from a suitable privacy mechanism. The statistician is free to choose the form of the privacy mechanism, and here we add Laplace distributed noise to a discretisation of the location of a feature vector Xi and to the value of its response variable Yi. Based on this randomised data, we design a novel estimator of the regression function, which can be viewed as a privatised version of the well-studied partitioning regression estimator. The main result is that the estimator is strongly universally consistent, and we further establish an upper bound on the rate of convergence. Our methods and analysis also give rise to a strongly universally consistent binary classification rule for locally differentially private data.
Highlights
In recent years there has been a surge of interest in data analysis methodology that is able to achieve strong statistical performance without comprimising the privacy and security of individual data holders
The concept of differential privacy [15] was introduced to provide a rigorous notion of the amount of private information on individuals published statistics contain. Statistical treatments of this framework include [36, 23, 2, 6]. It is a suitable constraint for many problems, procedures that are differentially private often require the presence of a third party, who may be trusted to handle the raw data before statistics are published
The local differential privacy constraint [see, for example, 21, 12, and the references therein] was introduced to provide a setting where analysis must be carried out in such a way that each raw data point is only ever seen by the original data holder
Summary
In recent years there has been a surge of interest in data analysis methodology that is able to achieve strong statistical performance without comprimising the privacy and security of individual data holders. It is a suitable constraint for many problems, procedures that are differentially private often require the presence of a third party, who may be trusted to handle the raw data before statistics are published To address this shortcoming, the local differential privacy constraint [see, for example, 21, 12, and the references therein] was introduced to provide a setting where analysis must be carried out in such a way that each raw data point is only ever seen by the original data holder. The problem of classification is strictly easier than regression, our methods and analysis give rise to a strongly universally consistent binary classification rule for locally differentially private data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.