Abstract

This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.

Highlights

  • Respondent driven sampling (RDS) is a popular means of sampling difficult to survey populations

  • This paper has contributed to the literature on sampling hidden and hard to reach populations, and Respondent Driven Sampling, by focusing on the issue of biased sampling variance estimation, which has only rarely been addressed to date [14,18]

  • If the RDS estimators of sampling variance are biased, researchers cannot trust confidence intervals and hypothesis tests derived from these estimators

Read more

Summary

Introduction

Respondent driven sampling (RDS) is a popular means of sampling difficult to survey populations. The sampling variance of random walks on graphs, and by extension other chain referral methods like RDS, depends on the specific network structure of the population under study which determines the closeness of nodes in the network, and, their covariance [6], and researchers do not know how closely the VHE or SBE can approximate it [14]. This assumption holds that RDS recruitment can be modeled as a FOM process on the nodal attribute of interest, where transitions between states depend solely on the prior state and not a higher order sequence of prior states [23] It is a convenient assumption for estimating RDS sampling variance, because a single RDS sample consists of a sequence of observed cases rather than the whole (population) network. Using Eq (5), we can write an estimate of the sampling variance of a size S random walk sample as the average of all the possible covariances among the population that the walk could take on G: sd b2mRWS

S2ðS À
SðSÀ1Þ pRi DS ðYiÃRDS
E F H 03
F EþFþH and b
Summary of VHE Bias
Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call