Abstract

Big Data is characterized by large volumes of highly dynamical data and is used for discovering hidden trends and correlations. However, as more data is collected, previous pieces of information can be put together to facilitate linkage of private records. In this context, when protecting the privacy of data subjects, the same attributes that are to be protected may be used for further re-identification, that is, sensitive attributes may be used as quasi-identifiers. For example, in high-dimensional data such as recommendations, transaction records or geo-located data, previously published transactions and locations may be used to uncover further private transactions and locations. In this paper, we propose a k-anonymization algorithm and a metric for privacy in databases in which all the attributes are quasi-identifiers as well as sensitive attributes. We apply our algorithm on high dimensional datasets for model-based and memory-based collaborative filtering, and use the metric to perform privacy comparisons between different methods of protection such as k-anonymity and differential privacy. We show the applicability of our method by performing tests on the large and sparse dataset (MovieLens 20M) of 20 million ratings that 138,493 users gave to 27,278 movies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.