Abstract

It is crucial to protect users’ location traces against inference attacks on aggregate mobility data collected from multiple users in various real-world applications. Most of the existing works on aggregate mobility data are focusing on inference attacks rather than designing privacy-preserving release mechanisms, and a few differential private release mechanisms suffer from poor utility-privacy tradeoffs. In this paper, we propose optimal centralized privacy-preserving aggregate mobility data release mechanisms (PAMDRMs) that minimize the leakage from an information-theoretic perspective by releasing perturbed versions of the raw aggregate location. Specifically, we use mutual information to measure user-level and aggregate-level privacy leakage separately, and formulate leakage minimization problems under utility constraints. As directly solving the optimization problems incur exponential complexity w.r.t. users’ trace length, we transform them into belief state Markov Decision Processes (MDPs), with a focus on the MDP formulation for the user-level privacy problem. We build reinforcement learning (RL) models and leverage the efficient Asynchronous Advantage Actor-Critic RL algorithm to derive the solutions to the MDPs as our optimal PAMDRMs. We compare them with two state-of-the-art privacy protection mechanisms PDPR (context-aware local design) and DMLM (context-free centralized design) in terms of mutual information leakage and adversary’s attack success (evaluated by her expected estimation error and Jensen-Shannon Divergence-based error). Extensive experimental results on both synthetic and real-world datasets demonstrate that the user-level PAMDRM performs the best on both measures thanks to its context-aware property and centralized design. Even though the aggregate-level PAMDRM achieves better privacy-utility tradeoff than the other two, it does not always perform better than them on adversarial success, highlighting the necessity of considering privacy measures from different perspectives to avoid overestimating the level of privacy offered to users. Lastly, we discuss an alternative, fully data-driven approach to derive the optimal PAMDRM by leveraging adversarial training on limited data samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call