Abstract

Web Search engines have become an indispensable online service to retrieve content on the Internet. However, using search engines raises serious privacy issues as the latter gather large amounts of data about individuals through their search queries. Two main techniques have been proposed to privately query search engines. A first category of approaches, called unlinkability, aims at disassociating the query and the identity of its requester. A second category of approaches, called indistinguishability, aims at hiding user’s queries or user’s interests by either obfuscating user’s queries, or forging new fake queries. This paper presents a study of the level of protection offered by three popular solutions: Tor-based, TrackMeNot, and GooPIR. For this purpose, we present an efficient and scalable attack – SimAttack – leveraging a similarity metric to capture the distance between preliminary information about the users (i.e., history of query) and a new query. SimAttack de-anonymizes up to 36.7 % of queries protected by an unlinkability solution (i.e., Tor-based), and identifies up to 45.3 and 51.6 % of queries protected by indistinguishability solutions (i.e., TrackMeNot and GooPIR, respectively). In addition, SimAttack de-anonymizes 6.7 % more queries than state-of-the-art attacks and dramatically improves the performance of the attack on TrackMeNot by 23.6 %, while retaining an execution time faster by two orders of magnitude.

Highlights

  • Search engines (e.g., Google, Bing, Yahoo!) have become the preferred way for users to find content on the Internet

  • Results show that the number of fake queries generated by GooPIR has a limited impact on the privacy protection of the user: SimAttack retrieves 60.2 % of initial queries when only one additional fake query is generated while 50.6 % of initial queries are retrieved when 7 fake queries

  • 9.3 Summary Combining an indistinguishability technique (i.e., TrackMeNot or GooPIR) over an unlinkability solution gives a better protection to the queries of user, especially if the adversary is not able to collect a large quantity of information about the user or if the user configures its indistinguishability solution to sent a high number of fake queries

Read more

Summary

Introduction

Search engines (e.g., Google, Bing, Yahoo!) have become the preferred way for users to find content on the Internet. The adversary is able to distinguish between fake queries and real ones by comparing the similarity between the query q+ and the user profile Pid (i.e., sim(q+, Pid)) with the threshold δ.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call