Abstract

Microaggregation is a successful mechanism to solve the tension between respondent privacy and data quality in the context of Statistical Disclosure Control. Microaggregation, for numerical datasets, is defined as a clustering problem with the constraint of having at least k records in each group, such that the sum of the within-group squared error (SSE) is minimized. Unfortunately, the data publisher has to execute an algorithm iteratively for different values of k to investigate a good trade-off between privacy and utility. Multiple execution of an algorithm on large numerical datasets is resource wasting, since most of the computations are repetitive. In this paper, we propose a Fast Data-oriented Microaggregation algorithm (FDM) that efficiently anonymizes large multivariate numerical datasets for multiple successive values of k. Experimental results on real world datasets demonstrate the superiority of the method in terms of both the data quality and time complexity. Moreover, the method usually achieves a better trade-off between disclosure risk and information loss of the protected dataset in comparison with previous techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call