A Probabilistic Procedure for Anonymisation, for Assessing the Risk of Re-identification and for the Analysis of Perturbed Data Sets

Harvey Goldstein,Natalie Shlomo

doi:10.2478/jos-2020-0005

Abstract

Abstract The requirement to anonymise data sets that are to be released for secondary analysis should be balanced by the need to allow their analysis to provide efficient and consistent parameter estimates. The proposal in this article is to integrate the process of anonymisation and data analysis. The first stage uses the addition of random noise with known distributional properties to some or all variables in a released (already pseudonymised) data set, in which the values of some identifying and sensitive variables for data subjects of interest are also available to an external ‘attacker’ who wishes to identify those data subjects in order to interrogate their records in the data set. The second stage of the analysis consists of specifying the model of interest so that parameter estimation accounts for the added noise. Where the characteristics of the noise are made available to the analyst by the data provider, we propose a new method that allows a valid analysis. This is formally a measurement error model and we describe a Bayesian MCMC algorithm that recovers consistent estimates of the true model parameters. A new method for handling categorical data is presented. The article shows how an appropriate noise distribution can be determined.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Official Statistics	Publication Date: Mar 1, 2020
Citations: 9	License type: CC BY-NC-ND 3.0

R Discovery Prime

R Discovery Prime

A Probabilistic Procedure for Anonymisation, for Assessing the Risk of Re-identification and for the Analysis of Perturbed Data Sets

Abstract

Talk to us

Similar Papers

More From: Journal of Official Statistics

Lead the way for us

Similar Papers

Parameter estimation in non-Gaussian noise
C G Constable
Geophysical Journal International | VOL. 94
C G ConstableC G Constable
01 Jul 1988
Geophysical Journal International | VOL. 94

Comparison of total least squares and instrumental variable methods for parameter estimation of transfer function models
Sabine Van Huffel ... Joos Vandewalle
International Journal of Control | VOL. 50
Sabine Van Huffel, et. al.Sabine Van Huffel ... Joos Vandewalle
01 Oct 1989
International Journal of Control | VOL. 50

Partial Effects for Binary Outcome Models with Unobserved Heterogeneity
Lucas Núñez
The Journal of Politics | VOL. 84
Lucas NúñezLucas Núñez
09 Dec 2021
The Journal of Politics | VOL. 84

Marginal Regression for Binary Longitudinal Data in Adaptive Clinical Trials
Brajendra C Sutradhar ... Atanu Biswas
Scandinavian Journal of Statistics | VOL. 32
Brajendra C Sutradhar, et. al.Brajendra C Sutradhar ... Atanu Biswas
01 Mar 2005
Scandinavian Journal of Statistics | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Probabilistic Procedure for Anonymisation, for Assessing the Risk of Re-identification and for the Analysis of Perturbed Data Sets

Abstract

Talk to us

Similar Papers

More From: Journal of Official Statistics