Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data

Shashidhar Virupaksha,Venkatesulu Dondeti

doi:10.1007/s12083-021-01080-y

Abstract

Data privacy is a major concern in data mining. Privacy-preserving data mining algorithms have been used for preserving privacy in data mining. However, privacy-preserving data mining on high dimensional continuous data leads to high data loss, information loss and identifying clusters are very difficult. In this paper, a novel technique Anonymized Noise Addition in Subspaces (ANAS) is proposed, which reduces data loss, information loss and enhances identification of clusters and privacy. Anonymization using aggregation is performed in dense and non-dense subspaces considering Euclidean distances to reduce data loss and enhance privacy. Random noise within the subspace limits is then applied to anonymized subspaces to enhance identification of clusters and reduce data loss. ANAS is run on benchmark datasets, and results show that ANAS can identify 80% of the original dataset clusters on sparse datasets, whereas the existing techniques do not identify any clusters. ANAS reduces data loss by 50%, information loss by 20% and enhances privacy by 40%.

Full Text