The randomized response technique offers an effective way for reducing potential bias resulting from nonresponse and untruthful responses when asking questions about sensitive behaviors or beliefs. The technique is also used for conducting statistical disclosure control of public use data files released by statistical agencies such as the U.S. Census Bureau. In both cases, the technique works by randomizing the actual survey responses using some known randomization model. In the case of asking sensitive survey questions, the randomization of responses is done by the survey respondents and only the randomized responses are collected, whereas in the case of disclosure control, the survey agency implements the randomization of responses after collecting the survey data and prior to releasing it for public use. This paper considers estimating the finite population mean from a survey where randomized responses are available for the study variable along with complete non-randomized auxiliary information. We define and study a class of nonparametric model-assisted estimators that make efficient use of the available auxiliary information and account for the complex survey design. The asymptotic properties of the proposed estimators are derived and a bootstrap variance estimator is presented. The finite sample performance of the proposed estimators is studied via extensive simulations accounting for a wide range of forms for the relationship between the study variable and auxiliary variable. The empirical results support the theoretical analyses and suggest that our proposed estimators are superior to existing estimators in most cases. Furthermore, the proposed methods are illustrated using real data from the 2015 U.S. consumer expenditure survey.
Read full abstract