Abstract
With the advent of smart health, smart cities, and smart grids, the amount of data has grown swiftly. When the collected data is published for valuable information mining, privacy turns out to be a key matter due to the presence of sensitive information. Such sensitive information comprises either a single sensitive attribute (an individual has only one sensitive attribute) or multiple sensitive attributes (an individual can have multiple sensitive attributes). Anonymization of data sets with multiple sensitive attributes presents some unique problems due to the correlation among these attributes. Artificial intelligence techniques can help the data publishers in anonymizing such data. To the best of our knowledge, no fuzzy logic-based privacy model has been proposed until now for privacy preservation of multiple sensitive attributes. In this paper, we propose a novel privacy preserving model F-Classify that uses fuzzy logic for the classification of quasi-identifier and multiple sensitive attributes. Classes are defined based on defined rules, and every tuple is assigned to its class according to attribute value. The working of the F-Classify Algorithm is also verified using HLPN. A wide range of experiments on healthcare data sets acknowledged that F-Classify surpasses its counterparts in terms of privacy and utility. Being based on artificial intelligence, it has a lower execution time than other approaches.
Highlights
In the digital era, data collection and storage for ultimate analysis are constantly expanding
Normalized Certainty Penalty (NCP) is calculated based on generalization steps in the case of (p, k) angelization and in F-Classify it is based on classification of attributes
sensitive attributes (SAs) using fuzzy classification provides for multi-dimensional partitioning with minimal information loss
Summary
Data collection and storage for ultimate analysis are constantly expanding. Individual privacy is compromised by the information set obtained, which comprises explicit identifiers, quasi-identifiers (QIs), sensitive attributes (SAs), and insensitive attributes. Personal identifiers, such as a name or a national identification number, are examples of explicit identifiers that are almost always re-identified. The privacy-preserving strategies presented in the literature [1,2,3] usually eliminated them from data sets. The majority of the methods proposed in the literature [1,2,3,4,5,6] focus on single sensitive attribute data sets and rely on single-dimensional generalization. In most cases, real-world data publishing entities will have multiple sensitive attributes (MSAs). In the case of MSAs, these techniques fail to protect privacy because the adversary breaches privacy with some background and non-membership knowledge attack
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.