Abstract

© 2019 - IOS Press and the authors. When producing anonymised microdata for research, national statistics institutes (NSIs) identify a number of 'risk scenarios' of how intruders might seek to attack a confidential dataset. This approach has been criticised for focusing on data protection only without sufficient reference to other aspects of confidentiality management, and for emphasising theoretical possibilities rather than evidence-based attacks. An alternative 'user-centred' approach offers more efficient outcomes and is more in tune with the spirit of data protection legislation, as well as the letter. The user-centred approach has been successfully adopted in controlled research facilities. However, it has not been systematically applied beyond these specialist facilities. This paper shows how the same approach can be applied to distributed data with limited NSI control. It describes the creation of a scientific use file (SUF) for business microdata, traditionally hard to protect. This case study demonstrates that an alternative perspective can have dramatically different outcomes as compared with established anonymization strategies; in the case study discussed, the alternative approach reduces 100% perturbation of continuous variables to under 1%. The paper also considers the implications for future developments in official statistics, such as administrative data and 'big data'.

Highlights

  • One of the key functions of national statistics institutes (NSIs) is to produce research datasets from the same sources used for aggregate statistics

  • This paper argues that the strategy used by NSIs to identify confidentiality protection measures is seriously misguided

  • Using as an example the recent creation by the authors of a ‘scientific use file’ (SUF) from multinational business survey data, the paper demonstrates that an alternative perspective can have dramatically different outcomes: in this case, from 100% perturbation of all continuous variables to perturbation of under 1% of the observations for just one variable

Read more

Summary

Introduction

One of the key functions of national statistics institutes (NSIs) is to produce research datasets from the same sources used for aggregate statistics. As several authors have noted, NSIs tend to be risk-averse, more comfortable with the ‘policing’ than the ‘sharing’ approach to data access and focused on the statistical product rather than the use to which it is put. This leads to best practice models that emphasise the protection of data even in the most extreme circumstances. This paper argues that the strategy used by NSIs to identify confidentiality protection measures is seriously misguided It leads to an excessively conservative approach which is not supported by evidence or required by law, and which can frustrate users.

Common approaches to anonymisation
Critique of common perspective
Example of an evidence-based risk assessment: the 2010 CIS8
Impact
Findings
Discussion and conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call