Abstract

Privacy-Preserving Data Publishing (PPDP) has become a critical issue for companies and organizations that would release their data. k-Anonymization was proposed as a first generalization model to guarantee against identity disclosure of individual records in a data set. Point access methods (PAMs) are not well studied for the problem of data anonymization. In this article, we propose yet another approximation algorithm for anonymization, coined BangA, that combines useful features from Point Access Methods (PAMs) and clustering. Hence, it achieves fast computation and scalability as a PAM, and very high quality thanks to its density-based clustering step. Extensive experiments show the efficiency and effectiveness of our approach. Furthermore, we provide guidelines for extending BangA to achieve a relaxed form of differential privacy which provides stronger privacy guarantees as compared to traditional privacy definitions.

Highlights

  • To sum-up the above discussion, we argue that every Privacy-Preserving Data Publishing (PPDP) task should meet at least the following theoretical and practical requirements in order to be valuable for the end-user:

  • In order to make a verifiable comparison of both approaches, a set of 1 million tuples with seven quasi-identifier attributes was randomly sampled from the Customer data set

  • We proposed a new generalization algorithm called BangA

Read more

Summary

Introduction

E.g., health care, typically gather this information for improving the quality of services; given the co-dependency of the Internet and information systems, sensitive data is under the radar of theft and corruption Organizations may release their microdata for the purpose of facilitating useful data analysis and research. Many organizations usually remove the uniquely identifying information like name or SSN from the published data This sanitization of data might not be helpful in guarding the secrecy of given individuals, as it may still be possible to link released records back to their identities by matching some combination of attributes like age, zip code and sex, coined quasi-identifier or linking attributes. These attributes can be used to infer the sensitive attributes, e.g., disease for any individual

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.