Abstract

As part of statistical disclosure control National Statistical Offices can only deliver confidential data being sufficiently protected meeting national legislation. When releasing confidential microdata to users, data holders usually apply what are called anonymisation methods to the data. In order to fulfil the privacy requirements, it is possible to measure the level of privacy of some confidential data file by simulating potential data intrusion scenarios matching publicly or commercially available data with the entire set of confidential data, both sharing a non-empty set of variables (quasi-identifiers). According to real world microdata, incompatibility between data sets and not unique combinations of quasi-identifiers are very likely. In this situation, it is nearly impossible to decide whether or not two records refer to the same underlying statistical unit. Even a successful assignment of records may be a fruitless disclosure attempt, if a rationale data intruder would keep distance from that match. The paper lines out that disclosure risks estimated thus far are overrated in the sense that revealed information is always a combination of both, systematically derived results and non-negligible random assignment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.