Preserving privacy on the searchable internet

Ruxia Ma,Xiaofeng Meng,Zhongyuan Wang

doi:10.1108/17440081211258196

Abstract

PurposeThe Web is the largest repository of information. Personal information is usually scattered on various pages of different websites. Search engines have made it easier to find personal information. An attacker may collect a user's scattered information together via search engines, and infer some privacy information. The authors call this kind of privacy attack “Privacy Inference Attack via Search Engines”. The purpose of this paper is to provide a user‐side automatic detection service for detecting the privacy leakage before publishing personal information.Design/methodology/approachIn this paper, the authors propose a user‐side automatic detection service. In the user‐side service, the authors construct a user information correlation (UICA) graph to model the association between user information returned by search engines. The privacy inference attack is mapped into a decision problem of searching a privacy inferring path with the maximal probability in the UICA graph and it is proved that it is a nondeterministic polynomial time (NP)‐complete problem by a two‐step reduction. A Privacy Leakage Detection Probability (PLD‐Probability) algorithm is proposed to find the privacy inferring path: it combines two significant factors which can influence the vertexes' probability in the UICA graph and uses greedy algorithm to find the privacy inferring path.FindingsThe authors reveal that privacy inferring attack via search engines is very serious in real life. In this paper, a user‐side automatic detection service is proposed to detect the risk of privacy inferring. The authors make three kinds of experiments to evaluate the seriousness of privacy leakage problem and the performance of methods proposed in this paper. The results show that the algorithm for the service is reasonable and effective.Originality/valueThe paper introduces a new family of privacy attacks on the Web: privacy inferring attack via search engines and presents a privacy inferring model to describe the process and principles of personal privacy inferring attack via search engines. A user‐side automatic detection service is proposed to detect the privacy inference before publishing personal information. In this user‐side service, the authors propose a Privacy Leakage Detection Probability (PLD‐Probability) algorithm. Extensive experiments show these methods are reasonable and effective.

Full Text