Abstract

Privacy has received increasing concerns in publication of datasets that contain sensitive information. Preventing privacy disclosure and providing useful information to legitimate users for data mining are conflicting goals. Generalization and randomized response methods were proposed in database community to tackle this problem. However, both of them have postulated the same prior belief for all transactions, which might be wrong modeling and lead to privacy breach. Besides, generalization and randomized response methods usually require a privacy controlling parameter to control the tradeoff between privacy and data quality, which may put the data publishers in a dilemma. In this paper, a novel privacy preserving method for data publication is proposed based on conditional probability distribution and machine learning techniques, which can achieve different prior beliefs for different transactions. A basic cross sampling algorithm and a complete cross sampling algorithm are designed respectively for the settings of single sensitive attribute and multiple sensitive attributes, and an improved complete algorithm is developed by using Gibbs sampling, in order to enhance data utility when data are not sufficient. Our method can offer stronger privacy guarantee, while, as shown in the extensive experiments, retaining better data utility.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.