Abstract

Cell suppression is a widely used technique for protecting sensitive information in statistical data presented in tabular form. Previous works on the subject mainly concentrate on two- and three-dimensional tables whose entries are subject to marginal totals. In this article we address the problem of protecting sensitive data in a statistical table whose entries are linked by a generic system of linear constraints. This very general setting covers, among others, k-dimensional tables with marginals, as well as hierarchical and linked tables. In particular, we address the optimization problem known in the literature as the (complementary or secondary) cell suppression problem, in which the information loss due to suppression must be minimized. We introduce a new integer linear programming model and outline an enumerative algorithm for its exact solution. The algorithm can also be used as a heuristic procedure to find near-optimal solutions. Extensive computational results on a test bed of 1,160 real world and randomly generated instances are presented, showing the effectiveness of the approach. In particular, we were able to solve to proven optimality four-dimensional tables with marginals as well as linked tables. To our knowledge, tables of this kind have never been solved optimally by previous authors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call