Using prior information in privacy-protecting survey designs for categorical sensitive variables

Heiko Groenitz

doi:10.1007/s00362-013-0573-3

Abstract

To gather data on sensitive characteristics, such as annual income, tax evasion, insurance fraud or students’ cheating behavior, direct questioning is problematic, because it often results in answer refusal or untruthful responses. For this reason, several randomized response (RR) and nonrandomized response (NRR) survey designs, which increase cooperation by protecting the respondents’ privacy, have been proposed in the literature. In the first part of this paper, we present a Bayesian extension of a recently published, innovative NRR method for multichotomous sensitive variables. With this extension, the investigator is able to incorporate prior information on the parameter, e.g., based on a previous study, into the estimation and to improve the estimation precision. In particular, we derive different point and interval estimates by the EM algorithm and data augmentation. The performance of the considered estimators is evaluated in a simulation study. In the second part of this article, we show that for any RR or NRR model addressing the estimation of the distribution of a categorical sensitive characteristic, the design matrices of the model play the central role for the Bayes estimation whereas the concrete answer scheme is irrelevant. This observation enables us to widely generalize the calculations from the first part and to establish a common approach for Bayes inference in RR and NRR designs for categorical sensitive variables. This unified approach covers even multi-stage models and models that require more than one sample.

Full Text