Abstract

Population composition is often estimated by double sampling in which the value of a covariate is noted on each of a large number of randomly selected units and the value of the covariate and the exact class to which the unit belongs is noted for a smaller sample. The cross-classified sample can be used to estimate the classification rates and these, in turn, can be used in conjunction with the estimated distribution of the covariate to obtain an improved estimate of the population composition over that obtained by direct observation of the identity of the individuals in a small sample. There are two approaches to this problem characterized by the way in which the classification rates are defined. The simplest approach uses estimates of the probability P(i|j) that the unit is actually in class i given that the covariate is in class j. The more complicated approach uses estimates of the probability P(j |i) that the covariate falls in class j given that the unit is actually in class i. The latter approach involves estimating more parameters than the former but avoids the necessity for the two samples to be drawn from the same population. We show the two approaches can be combined when there are multiple surveys. For example, one might conduct a disease survey for several years; in each year the accurate and/or error-prone techniques may be applied to samples. The sensitivities and specificities of the error-prone test are assumed constant across surveys. Generalizations allow for more than one error-prone classifier and partial verification (estimation of misclassification rates by application of the accurate technique to fixed subsamples from each error-prone category). The general approach is illustrated by considering a repeated survey for malaria.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call