Abstract

Privacy-preserving query log sharing has attracted considerable attention especially after the incident of AOL privacy leakage. Queries and URLs of the query logs reflect user preferences, which can help to increase the quality of personalized services. However, these logs may disclose users' sensitive information, and thus need to be sanitized before publication. The existing solutions focused on how to sample records to satisfy differential privacy guarantee and to hide individual preferences in the sampled query logs by perturbation. However, all of them suffer from leakage in queries and URL access, as well as extra redundancy. In this work, we propose to greedily select samples with high utility and provide privacy guarantee by prior estimation of n-word phrase utility. In the estimation, we utilize novel metrics to conduct differential semantic aggregation and to select the representative in each cluster, which can help to achieve the objective of leaking less privacy and releasing more useful information. Extensive experiments on real-world datasets demonstrate the utility of our solutions without compromising individual privacy, and released query logs have been applied to personalized search.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.