Abstract
The paper proposes a new disclosure limitation procedure based on simulation. The key feature of the proposal is to protect actual microdata by drawing artificial units from a probability model, that is estimated from the observed data. Such a model is designed to maintain selected characteristics of the empirical distribution, thus providing a partial representation of the latter. The characteristics we focus on are the expected values of a set of functionss these are constrained to be equal to their corresponding sample averagess the simulated data, then, reproduce on average the sample characteristics. If the set of constraints covers the parameters of interest of a user, information loss is controlled for, while, as the model does not preserve individual values, re-identification attempts are impaired-synthetic individuals correspond to actual respondents with very low probability. Disclosure is mainly discussed from the viewpoint of record re-identification. According to this definition, as the pledge for confidentiality only involves the actual respondents, release of synthetic units should in principle rule out the concern for confidentiality. The simulation model is built on the Italian sample from the Community Innovation Survey (CIS). The approach can be applied in more generality, and especially suits quantitative traits. The model has a semi-parametric component, based on the maximum entropy principle, and, here, a parametric component, based on regression. The maximum entropy principle is exploited to match data traitss moreover, entropy measures uncertainty of a distribution: its maximisation leads to a distribution which is consistent with the given information but is maximally noncommittal with regard to missing information. Application results reveal that the fixed characteristics are sustained, and other features such as marginal distributions are well represented. Model specification is clearly a major points related issues are selection of characteristics, goodness of fit and strength of dependence relations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.