Abstract
Data publishing is a challenging task for privacy preservation constraints. To ensure privacy, many anonymization techniques have been proposed. They differ in terms of the mathematical properties they verify and in terms of the functional objectives expected. Disassociation is one of the techniques that aim at anonymizing of set-valued datasets (e.g., discrete locations, search and shopping items) while guaranteeing the confidentiality property known as k m -anonymity. Disassociation separates the items of an itemset in vertical chunks to create ambiguity in the original associations. In a previous work, we defined a new ant-based clustering algorithm for the disassociation technique to preserve some items associated together, called utility rules, throughout the anonymization process, for accurate analysis. In this paper, we examine the disassociated dataset in terms of knowledge extraction. To make data analysis easy on top of the anonymized dataset, we define neighbor datasets or in other terms datasets that are the result of a probabilistic re-association process. To assess the neighborhood notion set-valued datasets are formalized into trees and a tree edit distance (TED) is directly applied between these neighbors. Finally, we prove the faithfulness of the neighbors to knowledge extraction for future analysis, in the experiments.
Highlights
Set-valued data is one of the data formats that can be extracted from social networks, it is characterized by associating a set of values to individuals
This work is an extension of [1] which studies the privacy-utility trade-off of certain predefined associations in an anonymized set-valued data through the disassociation technique, defined first in [2]
To illustrate the dilemma of data analysis and privacy preservation in set-valued data, let us consider an example of a mobility data, which stores the GPS location of individuals, each record corresponds to the set of visited cities by an individual
Summary
Set-valued data is one of the data formats that can be extracted from social networks, it is characterized by associating a set of values to individuals. This work is an extension of [1] which studies the privacy-utility trade-off of certain predefined associations in an anonymized set-valued data through the disassociation technique, defined first in [2]. Using data mining techniques and learning algorithms, machine learning infers models of what is underlying in order to predict possible futures. To benefit from the large amount of set-valued data, association rule mining is deployed and used in many fields all the way from discovering the link between diseases to marketing and retail. Publishing the dataset unrefined, fails to protect the privacy of Bob’s mobility data
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have