Abstract

Statistical disclosure control (SDC) is a balancing act between mandatory data protection and the comprehensible demand from researchers for access to original data. In this paper, a family of methods is defined to ‘mask’ sensitive variables before data files can be released. In the first step, the variable to be masked is ‘cloned’ (C). Then, the duplicated variable as a whole or just a part of it is ‘suppressed’ (S). The masking procedure's third step ‘imputes’ (I) data for these artificial missings. Then, the original variable can be deleted and its masked substitute has to serve as the basis for the analysis of data. The idea of this general ‘CSI framework’ is to open the wide field of imputation methods for SDC. The method applied in the I-step can make use of available auxiliary variables including the original variable. Different members of this family of methods delivering variance estimators are discussed in some detail. Furthermore, a simulation study analyzes various methods belonging to the family with respect to both, the quality of parameter estimation and privacy protection. Based on the results obtained, recommendations are formulated for different estimation tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call