Abstract

Releasing raw data sets with sensitive personal information will leak privacy. Therefore, various differential privacy methods have been proposed for efficient data sharing while preserving privacy. However, they focus on noise processing of all quasi-identifier attributes, which results in high time-space complexity and low data utility. In this paper, we propose a Differential Privacy Protection model considering the Correlations between Attributes, denoted DPPCA. DPPCA first computes the degree of correlations between the quasi-identifier attributes and the sensitive attributes, and determines the pair of attributes with maximal degree of correlation. Then based the attributes with the maximal degree of correlations, it uses microaggregation to partition the data set into clusters of size $k$ ( $k\geq 2$ ) according to three types of attributes, i.e., numerical, non-numerical, and hybrid attributes, such that there are $l$ ( $l ) values of sensitive attributes in a cluster. Finally, noise is added to each cluster separately such that each cluster satisfies $\varepsilon $ -differential privacy. While keeping the same degree of preserving privacy, our experimental results demonstrate that DPPCA substantially reduces the amount of added noise to 11% for the Census data set and the Adult data set. Therefore, DPPCA greatly improve the data utility while reaching the same degree of differential privacy.

Highlights

  • Data-sharing mechanisms make cooperation and research among various organizations convenient but substantially increase the risk of privacy disclosure [1]–[3]

  • This paper proposes the realization of a differentially private data release via microaggregation by using the association attributes, namely, we investigate how to effectively reduce the sensitivity of the dataset and improve the signal-to-noise ratio while realizing differential privacy

  • We propose a model for publishing private data using microaggregated association attribute differences, referred to as DPPCA

Read more

Summary

INTRODUCTION

Data-sharing mechanisms make cooperation and research among various organizations convenient but substantially increase the risk of privacy disclosure [1]–[3]. G. Yang et al.: Associated Attribute-Aware Differentially Private Data Publishing via Microaggregation noise that is added to the output is substantially increased, and the data utility is mainly a limited to the release of the query results [7]–[9]. It is challenging to prevent the disclosure of private data and to improve the utility of published data To solve these problems, the noninteractive protection framework [10] converts or compresses the original data and subsequently adds noise to the query results to satisfy ε-differential privacy. SoriaComas et al [12] proposed an insensitive microaggregation algorithm for increasing the within-cluster homogeneity They considered the sensitivity of the input datasets to changes in only one record.

RELATED WORK
CATEGORICAL ATTRIBUTE DIFFERENTIALLY PRIVATE DATA PUBLISHING METHOD
DIFFERENTIALLY PRIVATE DATA RELEASE FOR HYBRID ATTRIBUTES
RESULTS AND ANALYSIS
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.