Abstract
Standard corrections for missing data rely on the strong and generally untestable assumption of missing at random. Heckman-type selection models relax this assumption, but have been criticized because they typically require a selection variable which predicts non-response but not the outcome of interest, and can impose bivariate normality. In this paper we illustrate an application using a copula methodology which does not rely on bivariate normality. We implement this approach in data on HIV testing at a demographic surveillance site in rural South Africa which are affected by non-response. Randomized incentives are the ideal selection variable, particularly when implemented ex ante to deal with potential missing data. However, elements of survey design may also provide a credible method of correcting for non-response bias ex post. For example, although not explicitly randomized, allocation of food gift vouchers during our survey was plausibly exogenous and substantially raised participation, as did effective survey interviewers. Based on models with receipt of a voucher and interviewer identity as selection variables, our results imply that 37% of women in the population under study are HIV positive, compared to imputation-based estimates of 28%. For men, confidence intervals are too wide to reject the absence of non-response bias. Consistent results obtained when comparing different selection variables and error structures strengthen these conclusions. Our application illustrates the feasibility of the selection model approach when combined with survey metadata.
Highlights
Because of the implications for estimation, adjusting for missing data is an important component of program evaluation
We build on this analysis by illustrating an application using two selection variables based on survey design; a food gift voucher and interviewer identity, which not randomized, are plausibly exogenous in this survey context
Most standard approaches for dealing with missing data rely on assuming missing at random (MAR), which may not be realistic if there are reasons to suspect participation is correlated with outcomes after controlling for observed characteristics
Summary
Because of the implications for estimation, adjusting for missing data is an important component of program evaluation. The ideal selection variable in this context is a randomized incentive or survey intervention because it is guaranteed to be unrelated to the outcome (in expectation) other than through any effect on participation Because this approach is relatively rare, there are not many opportunities to leverage randomization to correct for missing data. We adopt the copula-based framework developed in Marra et al (2017) which allows flexible specification of unobserved dependence using various distributional forms We build on this analysis by illustrating an application using two selection variables based on survey design; a food gift voucher and interviewer identity, which not randomized, are plausibly exogenous in this survey context. We argue that showing results are robust to alternative exclusion restrictions and different distributional assumptions, as this framework allows, strengthens the conclusions from selection models
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.