Abstract

Data mining is the extraction of vast interesting patterns or knowledge from huge amount of data. The initial idea of privacy-preserving data mining PPDM was to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the party running the algorithm. In contrast, privacy-preserving data publishing (PPDP) may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. PPDP studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Privacy-preserving for both data mining (PPDM) and data publishing (PPDP) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, l-diversity, t-closeness, (α,k)-anonymity, etc. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. The aim of this paper is to present a survey for most of the common attacks techniques for anonymization-based PPDM & PPDP and explain their effects on Data Privacy.

Highlights

  • Data mining is potentially useful, many data holders are reluctant to provide their data for data mining for the fear of violating individual privacy

  • This paper presents a survey for most of the common attacks techniques for anonymization-based Privacy-preserving for both data mining (PPDM) & privacy-preserving data publishing (PPDP) and explains their effects on Data Privacy. k-anonymity is used for security of respondents identity and decreases linking attack in the case of homogeneity attack a simple k-anonymity model fails and we need a concept which prevent from this attack solution is l-diversity

  • All tuples are arranged in well represented form and adversary will divert to l places or on l sensitive attributes. l-diversity limits in case of background knowledge attack because no one predicts knowledge level of an adversary

Read more

Summary

Introduction

Data mining is potentially useful, many data holders are reluctant to provide their data for data mining for the fear of violating individual privacy. Study has been made to ensure that the sensitive information of individuals cannot be identified . One well studied approach is the k-anonymity model [1] which in turn led to other models such as confidence bounding, ldiversity [2], (α,k)-anonymity [3], t-closeness [4] These models assume that the data or table T contains: (1) a quasi-identifier (QID), which is a set of attributes (e.g., a QID may be {Date of birth, Zipcode, Sex}) in T which can be used to identify an individual, and (2) sensitive attributes, attributes in T which may contain some sensitive values (e.g., HIV of attribute Disease) of individuals. While k-anonymity protects against identity disclosure, it does not provide sufficient protection against attribute disclosure by the homogeneous attack and the background knowledge attack

Extending Models Since k-anonymity does not provide sufficient protection
Related Research Areas
Privacy-Preserving Data Publishing PPDP Attacks
Homogeneity Attack and Background Knowledge Attack
Background
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.