A novel privacy preserving method for data publication

Chaobin Liu,Shixi Chen,Shuigeng Zhou,Jihong Guan,Yao Ma

doi:10.1016/j.ins.2019.06.022

Chaobin Liu, Shixi Chen + Show 3 more

Open Access

https://doi.org/10.1016/j.ins.2019.06.022

Copy DOI

Journal: Information Sciences	Publication Date: Jun 8, 2019
Citations: 24	License type: cc-by-nc-nd

Affiliation: Fudan University, Tongji University

Abstract

Privacy has received increasing concerns in publication of datasets that contain sensitive information. Preventing privacy disclosure and providing useful information to legitimate users for data mining are conflicting goals. Generalization and randomized response methods were proposed in database community to tackle this problem. However, both of them have postulated the same prior belief for all transactions, which might be wrong modeling and lead to privacy breach. Besides, generalization and randomized response methods usually require a privacy controlling parameter to control the tradeoff between privacy and data quality, which may put the data publishers in a dilemma. In this paper, a novel privacy preserving method for data publication is proposed based on conditional probability distribution and machine learning techniques, which can achieve different prior beliefs for different transactions. A basic cross sampling algorithm and a complete cross sampling algorithm are designed respectively for the settings of single sensitive attribute and multiple sensitive attributes, and an improved complete algorithm is developed by using Gibbs sampling, in order to enhance data utility when data are not sufficient. Our method can offer stronger privacy guarantee, while, as shown in the extensive experiments, retaining better data utility.

Full Text