Preserving User Privacy for Machine Learning: Local Differential Privacy or Federated Machine Learning?

Huadi Zheng,Ziyang Han,Haibo Hu

doi:10.1109/mis.2020.3010335

Huadi Zheng, Ziyang Han + Show 1 more

Open Access

https://doi.org/10.1109/mis.2020.3010335

Copy DOI

Journal: IEEE Intelligent Systems	Publication Date: Jul 1, 2020
Citations: 54	License type: other-oa

Affiliation: Hong Kong Polytechnic University

Abstract

The growing number of mobile and IoT devices has nourished many intelligent applications. In order to produce high-quality machine learning models, they constantly access and collect rich personal data such as photos, browsing history, and text messages. However, direct access to personal data has raised increasing public concerns about privacy risks and security breaches. To address these concerns, there are two emerging solutions to privacy-preserving machine learning, namely local differential privacy and federated machine learning. The former is a distributed data collection strategy where each client perturbs data locally before submitting to the server, whereas the latter is a distributed machine learning strategy to train models on mobile devices locally and merge their output (e.g., parameter updates of a model) through a control protocol. In this article, we conduct a comparative study on the efficiency and privacy of both solutions. Our results show that in a standard population and domain setting, both can achieve an optimal misclassification rate lower than 20% and federated machine learning generally performs better at the cost of higher client CPU usage. Nonetheless, local differential privacy can benefit more from a larger client population ($>$> 1k). As for privacy guarantee, local differential privacy also has flexible control over the data leakage.

Full Text