Transactional Data Anonymization for Privacy and Information Preservation via Disassociation and Local Suppression

Xiangwen Liu,Xia Feng,Yuquan Zhu

doi:10.3390/sym14030472

Abstract

Ubiquitous devices in IoT-based environments create a large amount of transactional data on daily personal behaviors. Releasing these data across various platforms and applications for data mining can create tremendous opportunities for knowledge-based decision making. However, solid guarantees on the risk of re-identification are required to make these data broadly available. Disassociation is a popular method for transactional data anonymization against re-identification attacks in privacy-preserving data publishing. The anonymization algorithm of disassociation is performed in parallel, suitable for the asymmetric paralleled data process in IoT where the nodes have limited computation power and storage space. However, the anonymization algorithm of disassociation is based on the global recoding mode to achieve transactional data km -anonymization, which leads to a loss of combinations of items in transactional datasets, thus decreasing the data quality of the published transactions. To address the issue, we propose a novel vertical partition strategy in this paper. By employing local suppression and global partition, we first eliminate the itemsets which violate km-anonymity to construct the first km-anonymous record chunk. Then, by the processes of itemset creating and reducing, we recombine the globally partitioned items from the first record chunk to construct remaining km-anonymous record chunks. The experiments illustrate that our scheme can retain more association between items in the dataset, which improves the utility of published data.

Highlights

In the age of IoT, large-scale data on human behavior are generated from the interaction between an IoT device and a human or devices that provide simple data, such as sensing
SToolaudtdiornessStthaeteamboevnetproblems, we present a novel vertical partition scheme based on the anToonyamdizdarteiosnsttehchenaiqbuoevs,ei.ep.,rdoibsalessmocsi,atwione apnrdesloecnatl saupnporvesesliovne(rDtiLcSa),l tpoaprrteisteirovne schem odnatatuhteilitaynwohniylempirzotaetcitoinng ttheechpnriivqaucyeso,f di.iesa.,ssdociisaatesdsodcaitaat.ion and local suppression
First Record Chunk Construction Inspired by the approach proposed in the literature [43], we identify all itemsets violating km-anonymity and eliminate them by local suppression and global partition, to create the first record chunk

Summary

Introduction

In the age of IoT, large-scale data on human behavior are generated from the interaction between an IoT device and a human or devices that provide simple data, such as sensing. Medical treatment records, and click-stream data, human behavior data are common and usually organized as transactional data (set-valued data). Publishing and sharing transactional data for statistical analysis, prediction, or critical decisions in various applications of different areas are pivotal to advances in knowledge-based services and new scientific discoveries. Transactional data often contain detailed information about individuals. If a transactional record in a dataset is so specific that not many people can match it, there is a chance that, with the help of background knowledge, an adversary could uniquely identify the victim’s record and their sensitive information. There is an urgent demand for privacy-preserving transactional data publishing

Objectives

Methods

Results

Conclusion