Abstract

When the sampling scheme is in clusters and when the pools (of size k) within a cluster are assumed not to be independent, the Dorfman model for estimating the proportion under the binomial model is incorrect. The purpose of this paper is to propose a method for analyzing correlated binary data under the group testing framework. First, assuming that the probability of an individual varies according to a beta distribution, we derived an analytic expression for the probability of a positive pool and the correlation between two pools in each cluster. Second, we derived the exact probability mass function of the number of positive pools in each cluster that should be used to obtain the maximum likelihood estimate (MLE) of the proportion of individuals with a positive outcome. However, this MLE is not efficient in terms of computational resources. For this reason, we proposed another estimator based on the beta-binomial model for obtaining the approximate MLE of the proportion of interest. Based on a simulation study, the approximate estimator produced results that are very close to the exact MLE of the proportion of interest, with the advantage that this approach is computationally more efficient.

Highlights

  • The group testing model of Dorfman [1] is effective for reducing the number of diagnostic tests because instead of performing n individual diagnostic tests, it only requires g=n k when retesting is not done

  • When we obtained a sample of N independent clusters from a finite population of clusters, we sampled individuals within each selected cluster and randomly allocated these individuals to nl pools of size kl individuals for the detection or estimation of a particular disease

  • For the purpose of estimation, it is important to use the probability mass function of the number of positive pools in a cluster derived in this study to correctly estimate the proportion of interest, because it takes into account the fact that the pools formed in each cluster are correlated

Read more

Summary

Introduction

The group testing model of Dorfman [1] is effective for reducing the number of diagnostic tests because instead of performing n individual diagnostic tests, it only requires g=n k when retesting is not done (where k is the pool size). Since plant samples are taken at different locations throughout a geographical region or seed samples are taken from seed lots obtained from different regions, this means that individual plants or seed lots are inherently clustered by design and share common characteristics [5]. It is important to develop methods for analyzing pooled data when individuals are correlated and do not require the assumption of homogeneous plant distribution, as in a binomial distribution

Objectives
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call