Sanitizing and measuring privacy of large sparse datasets for recommender systems

Julián Salas

doi:10.1007/s12652-019-01391-2

Abstract

Big Data is characterized by large volumes of highly dynamical data and is used for discovering hidden trends and correlations. However, as more data is collected, previous pieces of information can be put together to facilitate linkage of private records. In this context, when protecting the privacy of data subjects, the same attributes that are to be protected may be used for further re-identification, that is, sensitive attributes may be used as quasi-identifiers. For example, in high-dimensional data such as recommendations, transaction records or geo-located data, previously published transactions and locations may be used to uncover further private transactions and locations. In this paper, we propose a k-anonymization algorithm and a metric for privacy in databases in which all the attributes are quasi-identifiers as well as sensitive attributes. We apply our algorithm on high dimensional datasets for model-based and memory-based collaborative filtering, and use the metric to perform privacy comparisons between different methods of protection such as k-anonymity and differential privacy. We show the applicability of our method by performing tests on the large and sparse dataset (MovieLens 20M) of 20 million ratings that 138,493 users gave to 27,278 movies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sanitizing and measuring privacy of large sparse datasets for recommender systems

Abstract

Talk to us

Similar Papers

More From: Journal of Ambient Intelligence and Humanized Computing

Lead the way for us

Journal: Journal of Ambient Intelligence and Humanized Computing	Publication Date: Jul 13, 2019
Citations: 16

Similar Papers

AQ-DP: A New Differential Privacy Scheme Based on Quasi-Identifier Classifying in Big Data
Haifeng Ke ... Shui Yu
-
Haifeng Ke, et. al.Haifeng Ke ... Shui Yu
01 Dec 2018
01 Dec 2018

Privacy in Big Data Through Variable t-Closeness for MSN Attributes
Zakariae El Ouazzani ... Hanan El Bakkali
-
Zakariae El Ouazzani, et. al.Zakariae El Ouazzani ... Hanan El Bakkali
28 Jul 2018
28 Jul 2018

Differential Privacy and Federal Data Releases
Jerome P Reiter
Annual Review of Statistics and Its Application | VOL. 6
Jerome P ReiterJerome P Reiter
07 Mar 2019
Annual Review of Statistics and Its Application | VOL. 6

PDP-SAG: Personalized Privacy Protection in Moving Objects Databases by Combining Differential Privacy and Sensitive Attribute Generalization
Fatemeh Deldar ... Mahdi Abadi
IEEE Access | VOL. 7
Fatemeh Deldar, et. al.Fatemeh Deldar ... Mahdi Abadi
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sanitizing and measuring privacy of large sparse datasets for recommender systems

Abstract

Talk to us

Similar Papers

More From: Journal of Ambient Intelligence and Humanized Computing