Abstract

With increasing amounts of personal information being collected by various organizations, many privacy models have been proposed for masking the collected data so that the data can be published without releasing individual privacy. However, most existing privacy models are not applicable to high-dimensional data, because of the sparseness of high-dimensional search space. In this paper, we present our solution to release high-dimensional data for privacy preservation and classification analysis. The challenge facing us is how to reduce high dimensions from the perspective of privacy models while preserving as much information as possible for classification. Our proposed approach tackles it by using an idea of vertical partition, which is to vertically divide the raw data into different disjointed subsets of smaller dimensionality. Specifically, our partition metric considers both the correlation between attributes and the proportion of attributes in each subset. Then a generalization method based on local recoding is employed to each subset separately for achieving k-anonymity. Considering the hardness of the optimal implementation of k-anonymity, the local recoding method finds a near-optimal solution with the goal of improving efficiency. The proposed approach was evaluated using two datasets, and the experimental results showed that it outperformed two related approaches in data utility at the same privacy level.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call