Extended systematic clustering: Microdata protection by distributing semsitive values

Widodo Widodo,Eko K Budiardjo,Harry T Y Achsan,Wahyu Catur Wibowo

doi:10.11591/eei.v9i4.1963

Abstract

Anonymity data for multiple sensitive attributes in microdata publishing is a growing field at present. This field has several models for anonymizing such as k-anonymity and l-diversity. Generalization and suppression became a common technique in anonymize data. But, the real problem in multiple sensitive attributes is sensitive value distribution. If sensitive values do not distribute evenly to each quasi identifier group, it is potentially revealed to sensitive value holder. This research investigated on how the high-sensitive values are distributed evenly into each group. We proposed a novel method/algorithm for distributing high-sensitive values when it forms groups. This method distributes high-sensitive values evenly and varies high-sensitive values in a group. We called our method as extended systematic clustering since it is an extension of systematic clustering method. Diversity metrics was used for evaluating our method. Experiment result showed our method outperformed systematic clustering with average diversity value 0.9719 while systematic clustering 0.3316.

Highlights

Privacy is an important issue in publishing microdata table, while microdata contains information of individual dan identities data
The contributions of this research are, (1) we proposed a novel algorithm for distributing high-sensitive values to each quasi identifier group, (2) we successfully implemented our method in multiple sensitive attributes, (3) we categorized sensitive values and set it into sensitive attribute categorization
Three attributes that is decided as sensitive attributes are education, workclass, and occupation

Summary

Introduction

Privacy is an important issue in publishing microdata table, while microdata contains information of individual dan identities data. An individual data covers three type of attributes that is called explicit identifier (EI), quasi identifier (QI), and sensitive attributes (SA) [1, 2]. EI is an attribute that contains an identifier such as name, employee number, or student identifier. In privacy preserving data publishing (PPDP), QI attributes are generalized or suppressed for obtaining anonymity table. Some records that the QI attributes cannot be distinguished formed quasi identifier groups. A table that contains some groups which each group has at least k records is called k-anonymity table [3,4,5,6,7]

Objectives

Methods

Results

Conclusion