Abstract

We study the problem of protecting sensitive data in a statistical two-dimensional table, when the non-sensitive table entries are made public along with the row and column totals. In particular, we address theNP-hard problem known in the literature as the (secondary) cell suppression problem. We introduce a new integer linear programming model and describe several new families of additional inequalities used to strengthen the linear relaxation of the model. Exact and heuristic separation procedures are also proposed and embedded within a branch-and-cut algorithm for the exact solution of the problem. The algorithm makes use of an efficient heuristic procedure to find near-optimal solutions. We report the exact solution of instances involving up to 250,000 cells and 10,000 sensitive cells, i.e., more than 3 orders of magnitude larger than those solved by previous techniques from the literature. A statistical agency collects data to be processed and published. Raw material is in- formation obtained from individual respondents. Usually, this data is obtained under a pledge of confidentiality: statistical agencies have the responsibility of not releasing any data or data summaries from which individual respondent information can be revea- led (sensitive data). On the other hand, statistical agencies aim at publishing as much information as possible. This results in a trade-off between privacy rights and infor- mation loss, an issue of primary importance in practice. We refer the interested reader to Willenborg and Waal (18) for an in-depth analysis of statistical disclosure control methodologies. Starting in 1996, the European Union supported through EUROSTAT (the European statistical office) a 3-year ESPRIT research project aimed at developing and testing new methodologies within statistical disclosure control. The project, coordinated by Dr. Leon Willenborg from CBS (Central Bureau of Statistics, Netherlands), involves several research groups from both academia and national statistical offices. We participate in the project for the definition of mathematical models and solution algorithms for protecting sensitive information in tabular data. Cell suppression is one of the most-widely applied techniques for disclosure avoi- dance. In this work we study the problem of applying this technique to protect sensitive information in a two-dimensional table of statistics, in which the non-sensitive data is

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call