Abstract
Privacy preserving data publishing (PPDP) refers to the releasing of anonymized data for the purpose of research and analysis. A considerable amount of research work exists for the publication of data, having a single sensitive attribute. The practical scenarios in PPDP with multiple sensitive attributes (MSAs) have not yet attracted much attention of researchers. Although a recently proposed technique (p, k)-Angelization provided a novel solution, in this regard, where one-to-one correspondence between the buckets in the generalized table (GT) and the sensitive table (ST) has been used. However, we have investigated a possibility of privacy leakage through MSA correlation among linkable sensitive buckets and named it as “fingerprint correlation fcorr attack.” Mitigating that in this paper, we propose an improved solution “c,k-anonymization” algorithm. The proposed solution thwarts the fcorr attack using some privacy measures and improves the one-to-one correspondence to one-to-many correspondence between the buckets in GT and ST which further reduces the privacy risk with increased utility in GT. We have formally modelled and analysed the attack and the proposed solution. Experiments on the real-world datasets prove the outperformance of the proposed solution as compared to its counterpart.
Highlights
Data generation and sharing have shown a drastic increase in the ongoing decade. e reason behind is obviously the growing sources of data due to huge research and smart revolution. e utility of the shared/published data is utilized in research and analysis by the data researchers. e research and analysis may involve data mining, statistical data analysis, and other policy makings
In the context of health records, the data owners are the individuals to whom the data belong. e hospital that collects, manipulates, and shares that data is known as the data publisher. e data researchers may be a wide range of stakeholders. e collected data contain private information, partial identifiers, and confidential or sensitive information about the data owners
In (p, k)-angelization [22], privacy can be breached under the fcorr attack. e tables published by the proposed (c, k)-anonymization are depicted in Tables 5 and 6. e “Name” attribute in Table 5 is not published while publishing the data. e proposed approach prevents against the adversary nmk and qik. e main contributions are as follows: (i) We propose an improvement of (p, k)-angelization, named as the (c, k)-anonymizaiton algorithm, for multiple sensitive attributes (MSAs) privacy. e proposed solution prevents against fcorr attack
Summary
Data generation and sharing have shown a drastic increase in the ongoing decade. e reason behind is obviously the growing sources of data due to huge research and smart revolution (smart grids, cities, devices, etc.). e utility of the shared/published data is utilized in research and analysis by the data researchers. e research and analysis may involve data mining, statistical data analysis, and other policy makings. Erefore, the scenario in this paper is more challenging, as we consider the dimensionality in quasi-identifiers (QIs) as well as more than one sensitive attributes, i.e., MSAs. An adversary or an attacker is a person who tries to breach the data privacy using different types of background knowledge (bk) about the MSA dataset. In our proposed (c, k)-anonymization algorithm, the bucketization approach is adopted, which separates the QIs and SAs into two separate tables: generalized table (GT) and sensitive table (ST), independently Both tables are respectively linked through BID. (iii) Based on the above points, the experimental results prove that our proposed approach provides better privacy and utility as compared to its counterpart
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.