Mathematical models for applying cell suppression methodology in statistical data protection

Juan-José Salazar-González

doi:10.1016/s0377-2217(02)00870-6

Abstract

This paper concerns the problem of protecting sensitive information in tabular data against different intruders. Statistical offices allow different schemes to solve this problem. One of them is the so-called cell suppression methodology, where some cell values can be suppressed. We present four mathematical models for the problem of finding a cell suppression pattern minimizing the loss of information while guaranteeing protection level requirements for different sensitive cells and different intruders. This problem is more general than, for example, the so-called “common respondent problem” mentioned by Jewett [Disclosure analysis for the1992 economic census; Working paper, United States Bureau of the Census, 1993] in statistical disclosure control. The first model corresponds to bi-level mathematical programming. The second model belongs to integer linear programming (ILP) and could be used on small-size tables where some nominal values are known to assume discrete values. The third model is also an ILP model which is valid when the nominal values of the table are continuous numbers, with the advantage of containing a small number of variables (one 0–1 variable for each cell in the table). On the other hand, this model has a bigger number of linear inequalities (related to the number of sensitive cells and the number of attackers). Nevertheless, this paper addresses this disadvantage which, when necessary, is overcome by a dynamic generation of the important inequalities. The overall algorithm follows a modern mathematical programming technique known as the branch-and-cut approach, and allows the finding of optimal solutions for medium-size tables. On large-size tables the approach can be used to find near-optimal solutions. The fourth model adds two continuous variables for each cell to the third model to allow the statistical office more control over the loss of information, thus producing more accurate but protected patterns. The paper ends pointing out another alternative methodology to produce patterns by shrinking all the different intruders into a single one, and compares it with the classical single-attacker methodology and with the above multi-attacker methodology.

Full Text