Utility-Based SK-Clustering Algorithm for Privacy Preservation of Anonymized Data in Healthcare

G Shobana,S Shankar

doi:10.2174/2666255813666190920100913

Abstract

Background: The increasing need for various data publishers to release or share the healthcare datasets has imparted a threat for the privacy and confidentiality of the Electronic Medical Records. However, the main goal is to share useful information thereby maximizing utility as well as ensuring that sensitive information is not disclosed. There always exist utility-privacy tradeoff which needs to be handled properly for the researchers to learn statistical properties of the datasets. Objective: The objective of the research article is to introduce a novel SK-Clustering algorithm that overcomes identity disclosure, attribute disclosure and similarity attacks. The algorithm is evaluated using metrics such as discernability measure and relative error so as to show its performance compared with other clustering algorithms. Methodology: The SK-Clustering algorithm flexibly adjusts the level of protection for high utility. Also the size of the clusters is minimized dynamically based on the requirements of the protection required and we add extra tuples accordingly. This will drastically reduce information loss thereby increasing utilization. Results: For a k-value of 50 the discernabilty measure of SK algorithm is 65000 whereas the Mondrian algorithm exhibits 70000 discernability measure and the Anatomy algorithm has a discernability measure of 150000. Similarly, the relative error of our algorithm is less than 10% for a tuple count of 35000 when compared to other k-anonymity algorithms. Conclusion: The proposed algorithm executes more competently in terms of minimal discernability measure as well as relative error, thereby proving higher data utility compared with traditionally available algorithms.

Full Text