Abstract

The development of several popular social networks and the publication of social networks’ data have led to the risk of leakage of sensitive and confidential information of individuals. This requires the preservation of privacy before the publication of a user’s data available from his Online Social Network (OSN) presence. Numerous algorithms have been proposed in the area of preserving the privacy of social network users’ information such as K-anonymity and L-diversity. Previous work has shown good results based on the concept of adding edges and noise nodes for achieving K-anonymity and L-diversity. K-anonymization techniques are able to prevent identity disclosure of users but are not sufficient to prevent the disclosure of sensitive information of users. In this direction, a number of techniques for preserving the sensitive information of social network users have been proposed. Although these techniques have shown reasonably good results to achieve anonymity, but they also lead to a substantial change in the original structure of the OSNs. In this article, the problems of preventing sensitive attribute disclosure and reducing the noisy nodes have been addressed by perturbing the sensitive attributes. Existing research uses L-diversity for preventing sensitive attribute disclosure resulting in skewness and similarity attacks. We have addressed the skewness attacks by removing the duplicate noisy nodes from the final dataset to be published for stakeholders by the OSN service providers. All the information of duplicate nodes has been stored in a table named Reference Attribute Table (RAT). This table will be accessible only to the service providers for the purpose of de-anonymizing the data of users. The proposed technique has been extensively evaluated using five metrics viz. APL, ACSPL, RRTI, number of noisy nodes, and information loss using four real-time datasets collected for OSNs namely CORA, ARNET, DBLP, and Twitter. Results of evaluation parameters viz. APL and RRTI show that there is less change in the structure of datasets after anonymization. Results of ACSPL show that our proposed technique is able to preserve sensitive attributes in the datasets. The maximum number of noisy nodes amongst all four datasets is 5.4% and the maximum information loss is 2.2%. Evaluation results make it evident that our proposed technique ensures privacy preservation with less loss of information and thus preserving the utility of published data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call