Privacy and Anonymization in Social Networks

B K Tripathy,M S Sishodia,Sumeet Jain,Anirban Mitra

doi:10.1007/978-3-319-05164-2_10

Abstract

As the Internet continues to grow, the proliferation of online social networks raises many privacy concerns. The users of these OSNs are divulging endless details about their lives online. This personal information can be used by attackers to perpetrate significant privacy breaches and carry out attacks such as identity theft and credit card fraud. The privacy concerns arise from not just the users posting their personal information online, but also from OSNs publishing this information for analysis. Driven by Web 2.0 applications, more and more social network has been made publicly available. Preserving the privacy of individuals in this published data is an important concern. Although privacy preservation in data publishing has been studied extensively and several important models such as k- anonymity and l-diversity as well as many efficient algorithms have been proposed, most of the existing studies deal with relational data only. Those methods cannot be applied to social network data straightforwardly. Anonymization of social network data is a much more challenging task than anonymizing relational data. Firstly, in relational databases, attacks come from identifying individuals from quasi-identifiers. But in social networks, information such as neighbourhood graphs can be used to identify individuals. Secondly, tuples can be anonymized in relational data without affecting other tuples. But in social networks, adding edges or vertices affects the neighbourhoods of other vertices in the graph as well. In this chapter, we give a brief overview of the privacy concerns in online social networks and provide a detailed description of our algorithm, GASNA, a greedy algorithm for social network anonymization. This algorithm provides structural anonymity and sensitive attribute protection by achieving k-anonymity and l-diversity in social network data. We also discuss the challenges faced by the existing algorithms/models for social network data privacy and suggest techniques to counter these challenges. The issues discussed are the high cost of achieving k-anonymity when the value of k is fixed and the need for a better anonymity model which suits the current scenario of social networks. We also propose a new model called partial anonymity which can help reduce the number of edges added for anonymization when the value d of d-neighbourhood is greater than 1.

Full Text