Abstract
When the sampling scheme is in clusters and when the pools (of size k) within a cluster are assumed not to be independent, the Dorfman model for estimating the proportion under the binomial model is incorrect. The purpose of this paper is to propose a method for analyzing correlated binary data under the group testing framework. First, assuming that the probability of an individual varies according to a beta distribution, we derived an analytic expression for the probability of a positive pool and the correlation between two pools in each cluster. Second, we derived the exact probability mass function of the number of positive pools in each cluster that should be used to obtain the maximum likelihood estimate (MLE) of the proportion of individuals with a positive outcome. However, this MLE is not efficient in terms of computational resources. For this reason, we proposed another estimator based on the beta-binomial model for obtaining the approximate MLE of the proportion of interest. Based on a simulation study, the approximate estimator produced results that are very close to the exact MLE of the proportion of interest, with the advantage that this approach is computationally more efficient.
Highlights
The group testing model of Dorfman [1] is effective for reducing the number of diagnostic tests because instead of performing n individual diagnostic tests, it only requires g=n k when retesting is not done
When we obtained a sample of N independent clusters from a finite population of clusters, we sampled individuals within each selected cluster and randomly allocated these individuals to nl pools of size kl individuals for the detection or estimation of a particular disease
For the purpose of estimation, it is important to use the probability mass function of the number of positive pools in a cluster derived in this study to correctly estimate the proportion of interest, because it takes into account the fact that the pools formed in each cluster are correlated
Summary
The group testing model of Dorfman [1] is effective for reducing the number of diagnostic tests because instead of performing n individual diagnostic tests, it only requires g=n k when retesting is not done (where k is the pool size). Since plant samples are taken at different locations throughout a geographical region or seed samples are taken from seed lots obtained from different regions, this means that individual plants or seed lots are inherently clustered by design and share common characteristics [5]. It is important to develop methods for analyzing pooled data when individuals are correlated and do not require the assumption of homogeneous plant distribution, as in a binomial distribution
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have