Abstract
One specific problem statistical offices and research institutes are faced with when releasing microdata is the preservation of confidentiality. Traditional methods to avoid disclosure often destroy the structure of the data, and information loss is potentially high. In this paper an alternative technique of creating scientific–use files is discussed, which reproduces the characteristics of the original data quite well. It is based on Fienberg (1997, 1994) who estimates and resamples from the empirical multivariate cumulative distribution function of the data in order to get synthetic data. The procedure creates data sets – the resample – which have the same characteristics as the original survey data. The paper includes some applications of this method with (a) simulated data and (b) innovation survey data, the Mannheim Innovation Panel (MIP), and a comparison between resampling and a common method of disclosure control (disturbance with multiplicative error) with regard to confidentiality on the one hand and the appropriateness of the disturbed data for different kinds of analyses on the other. The results show that univariate distributions can be better reproduced by unweighted resampling. Parameter estimates can be reproduced quite well if the resampling procedure implements the correlation structure of the original data as a scale or if the data is multiplicatively perturbed and a correction term is used. On average, anonymization of data with multiplicatively perturbed values protects better against re–identification than the various resampling methods used.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.