Abstract

Privacy-preserving data publishing is a process of releasing the anonymized dataset for various purposes of analysis and research. Earlier, researchers have dealt with datasets considering it would contain only one record for an individual [1:1 dataset], which is uncompromising in various applications. Later, many researchers concentrate on the dataset, where an individual has multiple records [1:M dataset]. In the paper, a model f-slip was proposed that can address the various attacks such as Background Knowledge (bk) attack, Multiple Sensitive attribute correlation attack (MSAcorr), Quasi-identifier correlation attack(QIcorr), Non-membership correlation attack(NMcorr) and Membership correlation attack(Mcorr) in 1:M dataset and the solutions for the attacks. In f-slip, the anatomization was performed to divide the raw table into two sub-tables (1) quasi-identifier and (2) sensitive attributes. The correlation of sensitive attributes is computed to anonymize the sensitive attributes without breaking the linking relationship. Further, the quasi-identifier table was divided and k-anonymity was implemented on it. An efficient anonymization technique, frequency-slicing, was also developed to anonymize the sensitive attributes. The novel approach in the f-slip model is the slicing of records according to the frequency of occurrences of sensitive attribute values in each sub-table. The workload experiment proves that the f-slip model is consistent as the number of records increases. Extensive experiments were performed on a real-world dataset Informs and proved that the f-slip model outstrips the state-of-the-art techniques in terms of utility loss, efficiency and also acquires an optimal balance between privacy and utility.

Highlights

  • Various organizations and institutions publish their data for research, analysis purposes, policy and decision making to make the data available for public and private sectors

  • The study presents the work on privacy-preserving data publishing on 1:M datasets

  • K-anonymity has been implemented on the quasi-identifier table

Read more

Summary

Introduction

Various organizations and institutions publish their data for research, analysis purposes, policy and decision making to make the data available for public and private sectors. The data released by the health sectors for the analysis and research purposes may hold the personal information of an individual such as explicit identifiers (e.g., name, SSN), quasi-identifiers (e.g., name, age, sex, race) and sensitive attributes (e.g., disease, symptoms, salary). Publishing such data with private and personal information leads to a privacy breach and the individual's privacy is compromised. Health sectors and organizations anonymize their microdata with the existing privacy algorithms and models to protect individuals' from various privacy breaches. Experimental evaluation and result analysis are explained and depicted through various graphs in section 9 and section 10 concludes the paper with future directions and its limitations

Related Works
Motivation and Challenges
Contribution
Preliminaries
Correlation of Sensitive Attributes
Experimental evaluation and result analysis
Utility Loss
Execution time
10 Conclusion and Future Directions
Findings
Funding Details
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call