Generalized k-Ward microaggregation

Jun-Lin Lin,Chia-Chun Hung,Laksamee Khomnotai

doi:10.1109/wcica.2014.7053544

Abstract

Microaggregation is a commonly used technique for statistical disclosure control of microdata. It divides the microdata into groups such that each group contains no fewer than k records, where k is a user-specified parameter; then it replaces each group with the group's centroid. The problem underlying microaggrgation is called the k-Partitions problem. The k-Partitions problem is a constrained optimization problem where the objective is to minimize the information loss incurred from the replacement of raw data with their respective centroids, and the constraint is to limit the group size to be no fewer than k. In the literature, many clustering algorithms have been modified for the k-Partitions problem. For example, the k-Ward algorithm is derived from Ward's Hierarchical Clustering algorithm. In this paper, we propose a general form of the k-Ward algorithm, and compare its performance with the original k-Ward algorithm.

Full Text