An Effective Multi-clustering Anonymization Approach Using Discrete Component Task for Non Binary High Dimensional Data Spaces

L.V Arun Shalin,K Prasadh

doi:10.1016/j.protcy.2016.08.099

Abstract

Clustering in common is a process of grouping elements together, so that the elements assigned to the same cluster are more comparable to each other than the remaining data points. Certain difficulties related to dealing with high dimensional data are ubiquitous and abundant. Research works conducted using anonymization method for high dimensional data spaces failed to address the problem related to dimensionality reduction for non binary databases. In this paper, Discrete Component Task Specific Multi-Clustering (DCTSM) approach is presented for dimensionality reduction on non binary database. To start with the analysis of attribute in the non binary database takes place and the process of projecting clusters identifies sparseness degree of dimensions. Then with the quantum distribution on multi cluster dimension, the solution for relevancy of attribute and redundancy on non-binary data spaces is provided. As a result, dimensionality reduction on non binary data leads to performance improvement on the basis of tag based feature. Multi clustering tag based feature reduction extracts individual features and are correspondingly replaced by the equivalent feature clusters (i.e.) tag clusters. During training, the DCTSM approach, multi clusters are used instead of the individual tag features and then during decoding the individual features are replaced by the corresponding multi clusters. To measure the effectiveness of the method, experiments are conducted on existing anonymization method for high dimensional data spaces and compared with the DCTSM approach using Statlog German Credit Data Set. DCTSM approach obtained results of 7.05% improved accuracy and was observed that it took minimal time during tag feature extraction and resulted in lesser error rate.

Full Text