Privacy-preserving high-dimensional data publishing for classification

Rong Wang,Yan Zhu,Chin-Chen Chang,Qiang Peng

doi:10.1016/j.cose.2020.101785

Abstract

With increasing amounts of personal information being collected by various organizations, many privacy models have been proposed for masking the collected data so that the data can be published without releasing individual privacy. However, most existing privacy models are not applicable to high-dimensional data, because of the sparseness of high-dimensional search space. In this paper, we present our solution to release high-dimensional data for privacy preservation and classification analysis. The challenge facing us is how to reduce high dimensions from the perspective of privacy models while preserving as much information as possible for classification. Our proposed approach tackles it by using an idea of vertical partition, which is to vertically divide the raw data into different disjointed subsets of smaller dimensionality. Specifically, our partition metric considers both the correlation between attributes and the proportion of attributes in each subset. Then a generalization method based on local recoding is employed to each subset separately for achieving k-anonymity. Considering the hardness of the optimal implementation of k-anonymity, the local recoding method finds a near-optimal solution with the goal of improving efficiency. The proposed approach was evaluated using two datasets, and the experimental results showed that it outperformed two related approaches in data utility at the same privacy level.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Privacy-preserving high-dimensional data publishing for classification

Abstract

Talk to us

Similar Papers

More From: Computers & Security

Lead the way for us

Journal: Computers & Security	Publication Date: Mar 3, 2020
Citations: 19

Similar Papers

M-Denclue for Effective Data Clustering in High Dimensional Non-Linear Data
-
International Journal of Innovative Technology and Exploring Engineering | VOL. 9
--
10 Nov 2019
International Journal of Innovative Technology and Exploring Engineering | VOL. 9

The Risk-Utility Tradeoff for Data Privacy Models
M Moein Almasi ... Hadi Hemmati
-
M Moein Almasi, et. al.M Moein Almasi ... Hadi Hemmati
01 Nov 2016
01 Nov 2016

Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud
Xuyun Zhang ... Surya Nepal
IEEE Transactions on Computers | VOL. 64
Xuyun Zhang, et. al.Xuyun Zhang ... Surya Nepal
01 Aug 2015
IEEE Transactions on Computers | VOL. 64

A multivariate feature selection framework for high dimensional biomedical data classification
Abeer Alzubaidi ... Georgina Cosma
-
Abeer Alzubaidi, et. al.Abeer Alzubaidi ... Georgina Cosma
01 Aug 2017
01 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Privacy-preserving high-dimensional data publishing for classification

Abstract

Talk to us

Similar Papers

More From: Computers &amp; Security

More From: Computers & Security