Abstract

As part of statistical disclosure control National Statistical Offices can only deliver confidential data being sufficiently protected meeting national legislation. When releasing confidential microdata to users, data holders usually apply what are called anonymisation methods to the data. In order to fulfil the privacy requirements, it is possible to measure the level of privacy of some confidential data file by simulating potential data intrusion scenarios matching publicly or commercially available data with the entire set of confidential data, both sharing a non-empty set of variables (quasi-identifiers). According to real world microdata, incompatibility between data sets and not unique combinations of quasi-identifiers are very likely. In this situation, it is nearly impossible to decide whether or not two records refer to the same underlying statistical unit. Even a successful assignment of records may be a fruitless disclosure attempt, if a rationale data intruder would keep distance from that match. The paper lines out that disclosure risks estimated thus far are overrated in the sense that revealed information is always a combination of both, systematically derived results and non-negligible random assignment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call