Abstract

Machine learning is increasingly used to produce predictive models for crowdsensing applications such as health monitoring and query suggestion. These models are more accurate when trained on large amount of data collected from different sources. However, such massive data collection presents serious privacy concerns. The personal crowdsensing data such as photos, voice records, and locations is often highly sensitive, and once being sent out to the collecting companies, falls out of the control of the crowdsensing users who own it. This may preclude the practice of transmitting all user data to a central location and training there using conventional machine learning approaches. In this paper, we advocate an alternative approach that leaves data stored on the user side and learns a shared model by coordinating local training of crowdsensing users in an iterative process. Specifically, we focus on regularized empirical risk minimization and propose an efficient scheme based on decomposition that enables multiple crowdsensing users to jointly learn an accurate learning model for a given learning objective without sharing their private crowdsensing data. We exploit the fact that the optimization problems used in many learning tasks are decomposable and can be solved in a parallel and distributed way by the alternating direction method of multipliers (ADMM). Considering the heterogeneity of different user devices in practice, we propose an asynchronous ADMM algorithm to speed up the training process. Our scheme lets users train independently on their own crowdsensing data and only share some updated model parameters instead of raw data. Moreover, secure computation and distributed noise generation are novelly integrated in our scheme to guarantee differential privacy of the shared parameters in the execution of the asynchronous ADMM algorithm. We analyze the privacy guarantee and demonstrate the privacy-utility trade-off of our privacy-preserving collaborative learning scheme empirically based on real-world data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call