Estimation of Covid-19 Prevalence from Serology Tests: A Partial Identification Approach

Panagiotis Toulis

doi:10.2139/ssrn.3587738

Abstract

We propose a partial identification method to estimate COVID-19 prevalence in the US. Our data are results from antibody tests (serology tests) in a population sample, where the test parameters, such as the true/false positive rates, are unknown. Our method scans the entire parameter space, and rejects parameter values using the joint data density as the test statistic. The key advantage of our method over more standard approaches is that it is valid in finite samples, requiring only independence of serology test results, and does not rely on asymptotic arguments, normality assumptions, or other approximations. We use recent COVID-19 serology studies in the US, and show that the parameter confidence set is generally wide, and cannot yet support definite conclusions. Specifically, recent serology studies from California suggest a prevalence anywhere in the range 0%-6%, and are therefore inconclusive. However, this range could be narrowed down to 0.3%-1.8% if the actual false positive rate of the antibody test was near its empirical estimate (∼0.5%). In a study from New York State, COVID-19 prevalence is confidently estimated in the range 11%-18%, which also suggests significant geographic variation in COVID-19 exposure across the US. Combining all datasets yields a 3%-9% prevalence range. Our results overall suggest that serology testing on a massive scale can give crucial information for future policy design, even when such tests are imperfect and their parameters unknown.

Full Text