Securing private information by data perturbation using statistical transformation with three dimensional shearing

G Sathish Kumar,K Premalatha

doi:10.1016/j.asoc.2021.107819

Abstract

Privacy is very important in shared data for the knowledge based applications. However it causes serious privacy concerns, when the sensitive data is stored and moved to other applications. It is vital to incorporate privacy in the sensitive data for the data mining process. While preserving privacy, certain protocols allow the knowledge extraction from the modified data without revealing the original information. In this work, a series of steps like, Weight of Evidence, Information Value, Min–Max normalization and 3D shearing are applied to perturb the quasi-identifiers in the data. The classification techniques such as Decision Tree, Random Forest, Extreme Gradient Boost and Support Vector Machines are employed in adult income, bank marketing and lung cancer datasets to analyze the performance of the original and perturbed data. Accuracy, variance and sensitivity-specificity are being considered as performance measures of the classifiers. This research work is compared with 2D rotation and 3D rotation algorithms. The experimental results clearly show that the proposed work preserves the data utility with higher data transformation capacity and privacy preserving capacity than the existing geometric transformation techniques.

Full Text