Personalized information systems are information-filtering systems that endeavor to tailor information-exchange functionality to the specific interests of their users. The ability of these systems to profile users based on their search queries at Google, disclosed locations at Twitter or rated movies at Netflix, is on the one hand what enables such intelligent functionality, but on the other, the source of serious privacy concerns. Leveraging on the principle of data minimization, we propose a data-generalization mechanism that aims to protect users’ privacy against non-fully trusted personalized information systems. In our approach, a user may like to disclose personal data to such systems when they feel comfortable. But when they do not, they may wish to replace specific and sensitive data with more general and thus less sensitive data, before sharing this information with the personalized system in question. Generalization therefore may protect user privacy to a certain extent, but clearly at the cost of some information loss. In this work, we model mathematically an optimized version of this mechanism and investigate theoretically some key properties of the privacy-utility trade-off posed by this mechanism. Experimental results on two real-world datasets demonstrate how our approach may contribute to privacy protection and show it can outperform state-of-the-art perturbation techniques like data forgery and suppression by providing higher utility for a same privacy level. On a practical level, the implications of our work are diverse in the field of personalized online services. We emphasize that our mechanism allows each user individually to take charge of their own privacy, without the need to go to third parties or share resources with other users. And on the other hand, it provides privacy designers/engineers with a new data-perturbative mechanism with which to evaluate their systems in the presence of data that is likely to be generalizable according to a certain hierarchy, highlighting spatial generalization, with practical application in popular location based services. Overall, a data-perturbation mechanism for privacy protection against user profiling, which is optimal, deterministic, and local, based on a untrusted model towards third parties.
Read full abstract