An Efficient Clustering for Anonymizing Data and Protecting Sensitive Labels

J Jesu Vedha Nayahi,V Kavitha

doi:10.1142/s0218488515500300

Abstract

Sharing data to a third party for the mutual benefit of the data owners plays a vital need in all computer applications. Ensuring the privacy of sensitive data is an important issue in data sharing and hence there is a need for privacy preserving data mining algorithms. Traditional privacy preserving algorithms are prone to attacks, have high information loss and sometimes fail to achieve the privacy constraints. In this paper, a (G,S) clustering algorithm is proposed to achieve clusters each with S diverse sensitive values for the sensitive attribute. The number of distinct sensitive values in each cluster would be exactly equal to the number of sensitive values in the given data set. The clusters’ formed are anonymized by replacing the actual values with the centroid(G) of the cluster. The algorithm achieves the maximum possible diverse sensitive values in each cluster making it resilient to similarity attack. The method guarantees high degree of privacy with very minimal loss of information. The privacy degree of the clusters formed is at least equal to the number of distinct values in the sensitive attribute. The worst case linking probability would be as low as 0.5. The significance of this algorithm is its resilience to similarity attack. The performance of the algorithm is proved to be better than the existing algorithms in terms of KL Divergence, Discernibility metric and few other metrics.

Full Text