Privacy-Preserving Query Log Sharing Based on Prior N-Word Aggregation

Xuying Meng,Zhiwei Xu,Bo Chen,Yujun Zhang

doi:10.1109/trustcom.2016.0131

Abstract

Privacy-preserving query log sharing has attracted considerable attention especially after the incident of AOL privacy leakage. Queries and URLs of the query logs reflect user preferences, which can help to increase the quality of personalized services. However, these logs may disclose users' sensitive information, and thus need to be sanitized before publication. The existing solutions focused on how to sample records to satisfy differential privacy guarantee and to hide individual preferences in the sampled query logs by perturbation. However, all of them suffer from leakage in queries and URL access, as well as extra redundancy. In this work, we propose to greedily select samples with high utility and provide privacy guarantee by prior estimation of n-word phrase utility. In the estimation, we utilize novel metrics to conduct differential semantic aggregation and to select the representative in each cluster, which can help to achieve the objective of leaking less privacy and releasing more useful information. Extensive experiments on real-world datasets demonstrate the utility of our solutions without compromising individual privacy, and released query logs have been applied to personalized search.

Full Text