A Genetic Approach to Statistical Disclosure Control

Jim E Smith,Andrea T Staggemeier,Martin C Serpell,Alistair R Clark

doi:10.1109/tevc.2011.2159271

Abstract

Statistical disclosure control is the collective name for a range of tools used by data providers such as government departments to protect the confidentiality of individuals or organizations. When the published tables contain magnitude data such as turnover or health statistics, the preferred method is to suppress the values of certain cells. Assigning a cost to the information lost by suppressing any given cell creates the “cell suppression problem.” This consists of finding the minimum cost solution which meets the confidentiality constraints. Solving this problem simultaneously for all of the sensitive cells in a table is NP-hard and not possible for medium to large sized tables. In this paper, we describe the development of a heuristic tool for this problem which hybridizes linear programming (to solve a relaxed version for a single sensitive cell) with a genetic algorithm (to seek an order for considering the sensitive cells which minimizes the final cost). Considering a range of real-world and representative “artificial” datasets, we show that the method is able to provide relatively low cost solutions for far larger tables than is possible for the optimal approach to tackle. We show that our genetic approach is able to significantly improve on the initial solutions provided by existing heuristics for cell ordering, and outperforms local search. This approach is then extended and applied to large statistical tables with over 200000 cells.

Highlights

In today’s “Knowledge Economy” many organisations hold large amounts of data gathered from a variety of sources, some of which they wish to publish, sell, or otherwise exploit and disseminate, whilst respecting the privacy of individual sources
To reduce this problem processing a specified sequence of the sensitive cells, Castro [5] has developed a new minimum-L 2-distance gradually building up a secondary suppression pattern so perturbation method which maintains both additivity and as to meet the protection constraints, while minimising the margin totals and has been shown to protect three- information loss. dimensional tables with up to 1,000,000 cells
As currently methods as it involves solving a difficult combinatorial implemented, the output from the linear programs (LPs) heuristic is not optimisation. It is the objective of this paper to extend available to the user, and because of the large numbers cell suppression, which preserves more of the original of constraints and variables, the “optimal” approach is cell values than perturbation methods, so that it can be only possible for tables with a few hundreds

Summary

A Genetic Approach to Statistical Disclosure

Abstract—Statistical Disclosure Control is the collective name for a range of tools used by data providers such as government departments to protect the confidentiality of individuals or organizations. Assigning a cost to the information lost by suppressing any given cell creates the “Cell Suppression Problem” This consists of finding the minimum cost solution which meets the confidentiality constraints. Solving this problem simultaneously for all of the sensitive cells in a table is NP-hard and not possible for medium to large sized tables. We show that our genetic approach is able to significantly improve on the initial solutions provided by existing heuristics for cell ordering, and outperforms local search. This approach is extended and applied to large statistical tables with over 200,000 cells

INTRODUCTION

BACKGROUND

The Incremental Attacker

METHODOLOGY

Procedure

Analysis

REDUCING THE COST OF THE FITNESS FUNCTION

PROTECTING LARGER STATISTICAL TABLES

Findings

VIII. CONCLUSIONS AND SUGGESTED FUTURE

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE transactions on evolutionary computation : a publication of the IEEE Neural Networks Council	Publication Date: Jun 1, 2012
Citations: 10	License type: cc-by

R Discovery Prime

R Discovery Prime

A Genetic Approach to Statistical Disclosure Control

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE transactions on evolutionary computation : a publication of the IEEE Neural Networks Council

Lead the way for us

Similar Papers

A genetic approach to statistical disclosure control
Jim E Smith ... Alistair R Clark
-
Jim E Smith, et. al.Jim E Smith ... Alistair R Clark
08 Jul 2009
08 Jul 2009

Models and algorithms for the 2-dimensional cell suppression problem in statistical disclosure control
Matteo Fischetti ... Juan José Salazar
Mathematical Programming | VOL. 84
Matteo Fischetti, et. al.Matteo Fischetti ... Juan José Salazar
01 Feb 1999
Mathematical Programming | VOL. 84

Initial application of ant colony optimisation to statistical disclosure control
Martin Serpell ... James Smith
-
Martin Serpell, et. al.Martin Serpell ... James Smith
06 Jul 2013
06 Jul 2013

Stabilized Benders Methods for Large-Scale Combinatorial Optimization, with Application to Data Privacy
Daniel Baena ... Jordi Castro
Management Science | VOL. 66
Daniel Baena, et. al.Daniel Baena ... Jordi Castro
16 Apr 2018
Management Science | VOL. 66

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Genetic Approach to Statistical Disclosure Control

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE transactions on evolutionary computation : a publication of the IEEE Neural Networks Council