Abstract

The concept of Privacy Preserving Data Mining (PPDM) is the method of extraction of hidden patterns or knowledge from great volumes of data without revealing sensitive personal or sensitive business information. The data used for data mining operation may contain sensitive data such as Social Security Number, Salary, Name, Credit card Number etc. Disclosure of such information is threat to the privacy of individuals. The aim of PPDM is to provide privacy of sensitive information in the data used for data mining. Several methods have been developed based on Anonymization, Perturbation and Cryptography. All these methods take list of sensitive attributes as input from data owner. Not only that, another limitation is they perform transformations on the data without considering the level of sensitivity of the attributes in order to provide privacy. We proposed a framework for PPDM based on anonymization guided by the sensitivity rank of the attribute. This work also automatically identifies the sensitive attributes in the data.The proposed work, PrivGuard: Sensitivity Guided Anonymization based PPDM with Automatic Selection of Sensitive Attributes finds sensitive attributes in the database by finding the Sensitivity Rank for each attribute. In order to find Sensitivity Rank for attributes it finds the rank of attribute by calculating attribute evaluation measures such as InformationGain, Symmetric Uncertainty attribute evaluation, Gain Ratio, OneR attribute evaluation etc. Then, computes the sensitivity rank and uses this to decide how much anonymization is required to provide the privacy. This method can fix the balance between data privacy and data utility by applying appropriate level of anonymization using taxonomy tree of the attribute. The level of anonymization is calculated by finding the generalization score based on attribute sensitivity rank. Finally, C4.5 and Naive Bayes classifiers are built on anonymized data and compared with other anonymization methods. Our method outperforms than existing methods and observed that our results are very near to results of data mining using original data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call