Abstract

Collecting and analyzing massive data generated from smart devices have become increasingly pervasive in crowdsensing, which are the building blocks for data-driven decision-making. However, extensive statistics and analysis of such data will seriously threaten the privacy of participating users. Local differential privacy (LDP) was proposed as an excellent and prevalent privacy model with distributed architecture, which can provide strong privacy guarantees for each user while collecting and analyzing data. LDP ensures that each user’s data is locally perturbed first in the client-side and then sent to the server-side, thereby protecting data from privacy leaks on both the client-side and server-side. This survey presents a comprehensive and systematic overview of LDP with respect to privacy models, research tasks, enabling mechanisms, and various applications. Specifically, we first provide a theoretical summarization of LDP, including the LDP model, the variants of LDP, and the basic framework of LDP algorithms. Then, we investigate and compare the diverse LDP mechanisms for various data statistics and analysis tasks from the perspectives of frequency estimation, mean estimation, and machine learning. Furthermore, we also summarize practical LDP-based application scenarios. Finally, we outline several future research directions under LDP.

Highlights

  • With the rapid development of wireless communication techniques, Internet-connected devices are ever-increasing and generate large amounts of data by crowdsensing [1]

  • It is still necessary and urgent to carry out a comprehensive survey on Local differential privacy (LDP) toward data statistics and analysis to help newcomers understand the complex discipline of this hot research area

  • We explore the practical applications with LDP to show how LDP is to be implemented in various applications, including in real systems (e.g., Google Chrome, Apple iOS), edge computing, hypothesis testing, social networks, and recommendation systems

Read more

Summary

Introduction

With the rapid development of wireless communication techniques, Internet-connected devices (e.g., smart devices and IoT appliances) are ever-increasing and generate large amounts of data by crowdsensing [1]. It is still necessary and urgent to carry out a comprehensive survey on LDP toward data statistics and analysis to help newcomers understand the complex discipline of this hot research area. In this survey, we conduct an in-depth overview of LDP with respect to its privacy models, the related research tasks for various data, enabling mechanisms, and wide applications. From the perspective research tasks, we summarize the existing LDP-based privacy-preserving mechanisms into three categories: frequency estimation , mean estimation and machine learning.

Theoretical Summarization of LDP
LDP Model
Definition
The Principle Method for Achieving LDP
Comparisons with Global Differential Privacy
A General Processing Framework of Local Differential Privacy
LDP Model Settings
The Framework of LDP Algorithm
The Variants of LDP
BLENDER
Local d-Privacy
ID-LDP
Frequency Estimation with LDP
General Frequency Estimation on Categorical Data
Direct Perturbation
Unary Encoding
Hash Encoding
Transformation
Subset Selection
Frequency Estimation on Set-Valued Data
Item Distribution Estimation
Frequent Items Mining
Frequent Itemset Mining
New Terms Discovery
Frequency Estimation on Key-Value Data
Frequency Estimation on Ordinal Data
Frequency Estimation on Numeric Data
Marginal Release on Multi-Dimensional Data
Conditional Probability Distribution Estimation
Frequency Estimation on Evolving Data
Mean Value Estimation with LDP
Mean Value Estimation on Evolving Data
Machine Learning with LDP
Supervised Learning
Unsupervised Learning
Empirical Risk Minimization
Deep Learning
Reinforcement Learning
Federated Learning
LDP in Real Practice
LDP in Various Fields
Edge Computing
Hypothesis Testing
Social Network
Recommendation System
Strengthen Theoretical Underpinnings
Focus on Data Correlations
Address High-Dimensional Data Analysis
Develop Prototypical Systems
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call