Abstract

With the advent of smart health, smart cities, and smart grids, the amount of data has grown swiftly. When the collected data is published for valuable information mining, privacy turns out to be a key matter due to the presence of sensitive information. Such sensitive information comprises either a single sensitive attribute (an individual has only one sensitive attribute) or multiple sensitive attributes (an individual can have multiple sensitive attributes). Anonymization of data sets with multiple sensitive attributes presents some unique problems due to the correlation among these attributes. Artificial intelligence techniques can help the data publishers in anonymizing such data. To the best of our knowledge, no fuzzy logic-based privacy model has been proposed until now for privacy preservation of multiple sensitive attributes. In this paper, we propose a novel privacy preserving model F-Classify that uses fuzzy logic for the classification of quasi-identifier and multiple sensitive attributes. Classes are defined based on defined rules, and every tuple is assigned to its class according to attribute value. The working of the F-Classify Algorithm is also verified using HLPN. A wide range of experiments on healthcare data sets acknowledged that F-Classify surpasses its counterparts in terms of privacy and utility. Being based on artificial intelligence, it has a lower execution time than other approaches.

Highlights

  • In the digital era, data collection and storage for ultimate analysis are constantly expanding

  • Normalized Certainty Penalty (NCP) is calculated based on generalization steps in the case of (p, k) angelization and in F-Classify it is based on classification of attributes

  • sensitive attributes (SAs) using fuzzy classification provides for multi-dimensional partitioning with minimal information loss

Read more

Summary

Introduction

Data collection and storage for ultimate analysis are constantly expanding. Individual privacy is compromised by the information set obtained, which comprises explicit identifiers, quasi-identifiers (QIs), sensitive attributes (SAs), and insensitive attributes. Personal identifiers, such as a name or a national identification number, are examples of explicit identifiers that are almost always re-identified. The privacy-preserving strategies presented in the literature [1,2,3] usually eliminated them from data sets. The majority of the methods proposed in the literature [1,2,3,4,5,6] focus on single sensitive attribute data sets and rely on single-dimensional generalization. In most cases, real-world data publishing entities will have multiple sensitive attributes (MSAs). In the case of MSAs, these techniques fail to protect privacy because the adversary breaches privacy with some background and non-membership knowledge attack

Motivation
Diagnostic Method
Methods
Literature Review
Evaluation
Preliminaries
Notation
Fuzzification
Proposed Approach
Linguistic Variables and Fuzzy Sets
Fuzzy Inference Rule-Based
Defuzzification
Permutation
F-Classify Algorithm
Formal Modeling and Analysis
Experimental Setup
Measurement of Privacy
Discernibility Penalty
Query Error
Execution Time Analysis
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call