EDAMS: Efficient Data Anonymization Model Selector for Privacy-Preserving Data Publishing

T Qamar,N Z Bawany,N A Khan

doi:10.48084/etasr.3374

Abstract

The evolution of internet to the Internet of Things (IoT) gives an exponential rise to the data collection process. This drastic increase in the collection of a person’s private information represents a serious threat to his/her privacy. Privacy-Preserving Data Publishing (PPDP) is an area that provides a way of sharing data in their anonymized version, i.e. keeping the identity of a person undisclosed. Various anonymization models are available in the area of PPDP that guard privacy against numerous attacks. However, selecting the optimum model which balances utility and privacy is a challenging process. This study proposes the Efficient Data Anonymization Model Selector (EDAMS) for PPDP which generates an optimized anonymized dataset in terms of privacy and utility. EDAMS inputs the dataset with required parameters and produces its anonymized version by incorporating PPDP techniques while balancing utility and privacy. EDAMS is currently incorporating three PPDP techniques, namely k-anonymity, l-diversity, and t-closeness. It is tested against different variations of three datasets. The results are validated by testing each variation explicitly with the stated techniques. The results show the effectiveness of EDAMS by selecting the optimum model with minimal effort.

Highlights

The advent of Internet of Things (IoT), high processing speed hardware, and cloud storage with high bandwidth communication produces vast amounts of data which would be unthinkable a couple of decades ago
EXPERIMENTS Efficient Data Anonymization Model Selector (EDAMS) is developed using Java that run on a 2.4GHz Intel Core i5 Processor with 6GB RAM
This study presented the data anonymization model selection tool EDAMS that is capable of generating anonymized data with minimal effort

Summary

Introduction

The advent of IoT, high processing speed hardware, and cloud storage with high bandwidth communication produces vast amounts of data which would be unthinkable a couple of decades ago. Due to these advancements, around 2.5 quintillion bytes of data are created each day [1]. The applications used in order to perform daily routine activities efficiently are constantly saving, collecting, and tracking user data. The release of micro-data results in tracking the public and private lives of concerned individuals, putting their privacy at risk [3, 5, 6]. In the publishing phase the data are provided to data recipients who can be data miners or other third parties that can make use of that data for their own purposes

Objectives

Methods

Findings

Discussion

Conclusion