Decision letter: Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys

Isabel Rodriguez-Barraquer,Andrew Azman,Miles P Davenport,Sereina Herzog

doi:10.7554/elife.64206.sa1

Abstract

Article Figures and data Abstract Introduction Materials and methods Results Discussion Appendix 1 Data availability References Decision letter Author response Article and author information Metrics Abstract Establishing how many people have been infected by SARS-CoV-2 remains an urgent priority for controlling the COVID-19 pandemic. Serological tests that identify past infection can be used to estimate cumulative incidence, but the relative accuracy and robustness of various sampling strategies have been unclear. We developed a flexible framework that integrates uncertainty from test characteristics, sample size, and heterogeneity in seroprevalence across subpopulations to compare estimates from sampling schemes. Using the same framework and making the assumption that seropositivity indicates immune protection, we propagated estimates and uncertainty through dynamical models to assess uncertainty in the epidemiological parameters needed to evaluate public health interventions and found that sampling schemes informed by demographics and contact networks outperform uniform sampling. The framework can be adapted to optimize serosurvey design given test characteristics and capacity, population demography, sampling strategy, and modeling approach, and can be tailored to support decision-making around introducing or removing interventions. Introduction Serological testing is a critical component of the response to COVID-19 as well as to future epidemics. Assessment of population seropositivity, a measure of the prevalence of individuals who have been infected in the past and developed antibodies to the virus, can address gaps in knowledge of the cumulative disease incidence. This is particularly important given inadequate viral diagnostic testing and incomplete understanding of the rates of mild and asymptomatic infections (Sutton et al., 2020). In this context, serological surveillance has the potential to provide information about the true number of infections, allowing for robust estimates of case and infection fatality rates (Fontanet et al., 2020) and for the parameterization of epidemiological models to evaluate the possible impacts of specific interventions and thus guide public health decision-making. The proportion of the population that has been infected by, and recovered from, the coronavirus causing COVID-19 will be a critical measure to inform policies on a population level, including when and how social distancing interventions can be relaxed, and the prioritization of vaccines (Bubar et al., 2021). Individual serological testing may allow low-risk individuals to return to work, school, or university, contingent on the immune protection afforded by a measurable antibody response (Weitz et al., 2020; Larremore, 2020). At a population level, however, methods are urgently needed to design and interpret serological data based on testing of subpopulations, including convenience samples such as blood donors (Valenti et al., 2020; Erikstrup et al., 2021; Fontanet et al., 2020) and neonatal heel sticks, to reliably estimate population seroprevalence. Three sources of uncertainty complicate efforts to learn population seroprevalence from subsampling. First, tests may have imperfect sensitivity and specificity, and studies that do not adjust for test imperfections will produce biased seroprevalence estimates. Complicating this issue is the fact that sensitivity and specificity are, themselves, estimated from data (Larremore and Fosdick, 2020; Gelman and Carpenter, 2020), which can lead to statistical confusion if uncertainty is not correctly propagated (Bendavid et al., 2020). Second, the population sampled will likely not be a representative random sample (Bendavid et al., 2020), especially in the first rounds of testing, when there is urgency to test using convenience samples and potentially limited serological testing capacity. Third, there is uncertainty inherent to any model-based forecast that uses the empirical estimation of seroprevalence, regardless of the quality of the test, in part because of the uncertain relationship between seropositivity and immunity (Tan et al., 2020; Ward et al., 2020). A clear evidence-based guide to aid the design of serological studies is critical to policy makers and public health officials both for estimation of seroprevalence and forward-looking modeling efforts, particularly if serological positivity reflects immune protection. To address this need, we developed a framework that can be used to design and interpret cross-sectional serological studies, with applicability to SARS-CoV-2. Starting with results from a serological survey of a given size and age stratification, the framework incorporates the test’s sensitivity and specificity and enables estimates of population seroprevalence that include uncertainty. These estimates can then be used in models of disease spread to calculate the effective reproductive number Reff, the transmission potential of SARS-CoV-2 under partial immunity, forecast disease dynamics, and assess the impact of candidate public health and clinical interventions. Similarly, starting with a pre-specified tolerance for uncertainty in seroprevalence estimates, the framework can be used to optimize the sample size and allocation needed. This framework can be used in conjunction with any model, including ODE models (Kissler et al., 2020a; Weitz et al., 2020), agent-based simulations (Ferguson et al., 2020), or network simulations (St-Onge et al., 2019), and can be used to estimate Reff or to simulate transmission dynamics. Materials and methods Design and modeling framework Request a detailed protocol We developed a framework for the design and analysis of serosurveys in conjunction with epidemiological models (Figure 1), which can be used in two directions. In the forward direction, starting from serological data, one can estimate seroprevalence. While valuable on its own, seroprevalence can also be used as the input to an appropriate model to update forecasts or estimate the impacts of interventions. In the reverse direction, sample sizes can be calculated to yield seroprevalence estimates with a desired level of uncertainty and efficient sampling strategies can be developed based on prospective modeling tasks. The key methods include seroprevalence estimation, propagation of uncertainty through models, and model-informed sample size calculations. Figure 1 Download asset Open asset Framework for estimating seroprevalence and epidemiological parameters and the associated uncertainty, and for designing seroprevalence studies. Bayesian inference of seroprevalence Request a detailed protocol To integrate uncertainty arising from test sensitivity and specificity, we used a Bayesian model to produce a posterior distribution of seroprevalence that incorporates uncertainty associated with a finite sample size (Figure 1, green annotations). We denote the posterior probability that the true population seroprevalence is equal to θ, given test outcome data X and test sensitivity and specificity characteristics, as Pr⁢(θ∣X,se,sp). Because sample size and outcomes are included in X, and because test sensitivity and specificity are included in the calculations, this posterior distribution over θ appropriately handles uncertainty due to limited sample sizes and an imperfect testing instrument, and can be used to produce a point estimate of seroprevalence or a posterior credible interval. The model and sampling algorithm are fully described in Appendix A1. Sampling frameworks for seropositivity estimates are likely to be non-random and constrained to subpopulations. For example, convenience sampling (testing blood samples that were obtained for another purpose and are readily available) will often be the easiest and quickest data collection method (Winter et al., 2018). Two examples of such convenience samples are newborn heel stick dried blood spots, which contain maternal antibodies and thus reflect maternal exposure, and serum from blood donors (Valenti et al., 2020; Erikstrup et al., 2021; Fontanet et al., 2020). As a result, another source of statistical uncertainty comes from uneven sampling from a population. To estimate seropositivity for all subpopulations based on a given sample (stratified, convenience, or otherwise), we specified a Bayesian hierarchical model that included a common prior distribution on subpopulation-specific seropositivities θi (Appendix A1). In effect, this allowed seropositivity estimates from individual subpopulations to inform each other while still taking into account subpopulation-specific testing outcomes. The joint posterior distribution of all subpopulation prevalences was sampled using Markov chain Monte Carlo (MCMC) methods (Appendix A1). Those samples, representing posterior seroprevalence estimates for individual subpopulations, were then combined in a demographically weighted average to obtain estimates of overall seroprevalence, a process commonly known as poststratification (Little, 1993; Gelman and Carpenter, 2020). We focus the demonstrations and analyses of our methods on age-based subpopulations due to their integration into POLYMOD-type age-structured models (Mossong et al., 2008; Prem et al., 2017), but note that our mathematical framework generalizes naturally to other definitions of subpopulations, including those defined by geography (Fontanet et al., 2020; Nyc health testing data, 2020; Nisar et al., 2020; Malani et al., 2020). Propagating serological uncertainty through models Request a detailed protocol In addition to estimating core epidemiological quantities (Farrington and Whitaker, 2003; Farrington et al., 2001; Hens et al., 2012) or mapping out patterns of outbreak risk (Abrams et al., 2014), the posterior distribution of seroprevalence can be used as an input to any epidemiological model. Such models include the standard SEIR model, where the proportion seropositive may correspond to the recovered/immune compartment, as well as more complex frameworks such as an age-structured SEIR model incorporating interventions like school closures and social distancing (Davies et al., 2020; Figure 1, blue annotations). We integrated and propagated uncertainty in the posterior estimates of seroprevalence and uncertainty in model dynamics or parameters using Monte Carlo sampling to produce a posterior distribution of epidemic trajectories or key epidemiological parameter estimates (Figure 1, black annotations). Single-population SEIR model with social distancing and serology Request a detailed protocol To integrate inferred seroprevalence with uncertainty into a single-population SEIR model, we created an ensemble of SEIR model trajectories by repeatedly running simulations whose initial conditions were drawn from the seroprevalence posterior distribution. In particular, the seroprevalence posterior distribution was sampled, and each sample θ was used to inform the fraction of the population initially placed into the ‘recovered’ compartment of the model. Thus, uncertainty in posterior seroprevalence was propagated through model outcomes, which were measured as epidemic peak timing and peak height. Social distancing was modeled by decreasing the contact rate between susceptible and infected model compartments. A full description of the model and its parameters can be found in Appendix A2 and Supplementary file 1. Age-structured SEIR model with serology Request a detailed protocol To integrate inferred seroprevalence with uncertainty into an age-structured SEIR model, we considered a model with 16 age bins (0-4,5-9,…⁢75-79). This model was parameterized using country-specific age-contact patterns (Mossong et al., 2008; Prem et al., 2017) and COVID-19 parameter estimates (Davies et al., 2020). The model, due to Davies et al., 2020, includes age-specific clinical fractions and varying durations of preclinical, clinical, and subclinical infectiousness, as well as a decreased infectiousness for subclinical cases. A full description of the model and its parameters can be found in Appendix A2 and Supplementary file 1. As in the single-population SEIR model, seroprevalence with uncertainty was integrated into the age-structured model by drawing samples from seroprevalence posterior to specify the fraction of each subpopulation placed into ‘recovered’ compartments. Posterior samples were drawn from the age-stratified joint posterior distribution whose subpopulations matched the model’s subpopulations. For each set of posterior samples, the effective reproduction number Reff was computed from the model’s next-generation matrix. Thus, we quantified both the impact of age-stratified seroprevalence (assumed to be protective) on Reff as well as uncertainty in Reff. Serosurvey sample size and allocation for inference and modeling Request a detailed protocol The flexible framework described in Figure 1 enables the calculation of sample sizes for different serological survey designs. To calculate the number of tests required to achieve a seroprevalence estimate with a specified tolerance for uncertainty, and to determine optimal test allocation across subpopulations in the context of studying a particular intervention, we treated the estimate uncertainty as a framework output and then sought to minimize it by improving the allocation of samples (Figure 1, dashed arrow). Uniform allocation of samples to subpopulations is not always optimal. It can be improved by (i) increasing sampling in subpopulations with higher seroprevalence and (ii) sampling in subpopulations with higher relative influence on the quantity to be estimated. This approach, which we term model and demographics informed (MDI), allocates samples to subpopulations in proportion to how much sampling them would decrease the posterior variance of estimates, that is, ni∝xi⁢θi*⁢(1-θi*), where θi*=1-sp+θi⁢(se+sp-1) is the probability of a positive test in subpopulation i given test sensitivity (se), test specificity (sp), and subpopulation seroprevalence θi, and xi is the relative importance of subpopulation i to the quantity to be estimated. The sample allocation recommended by MDI varies depending on the information available and the quantity of interest. When the key quantity is overall seroprevalence, xi is the fraction of the population in subpopulation i. When the key quantity is total infections, the effective reproductive number, Reff, or another quantity derived from compartmental models with subpopulations, xi, is the ith entry of the principal eigenvector of the model’s next-generation matrix, after modification to include modeled interventions. In such scenarios, this approach balances the importance of sampling subpopulations due to their role in dynamics (xi) and higher variance in seroprevalence estimates themselves (θi*⁢(1-θi*)). If subpopulation prevalence estimates θi are unknown, sample allocation based solely on xi is recommended. These methods are derived in Appendix A3. Data sources Request a detailed protocol Age distribution of U.S. blood donors was drawn from a study of Atlanta donors (Shaz et al., 2011). Age distribution of U.S. mothers was drawn from the 2016 CDC Vital Statistics Report using Massachusetts as a reference state (Martin et al., 2018). Daily age-structured contact data were drawn from Prem et al., 2017. All data were represented using 5-year age bins, that is, (0-4, 5-9,…,74-79). For datasets with bins wider than 5 years, counts were distributed evenly into the 5-year bins. Serological test characteristics were collected from registrations with the U.S. Food and Drug Administration, 2021 and summarized in Supplementary file 1. No attempt was made to test or validate manufacturer claims, and point estimates of sensitivity and specificity were used that did not incorporate test calibration sample sizes (Gelman and Carpenter, 2020; Larremore and Fosdick, 2020). Demographic data for the U.S., India, and Switzerland (analyzed in the article) as well as other countries (provided in open-source code) were downloaded from the 2019 United Nations World Populations Prospects report (United Nations, 2019). Hypothetical survey samples were drawn based on comprehensive seroprevalence estimates from Geneva, Switzerland (Stringhini et al., 2020). Results Test sensitivity/specificity, sampling bias, and true seroprevalence influence the accuracy and robustness of estimates We simulated serological data from a single population with seroprevalence rates ranging from 1% to 50% using the reported sensitivity (90%) and specificity (>99.9%) of the Euroimmun SARS-CoV-2 IgG test (U.S. Food and Drug Administration, 2021; Supplementary file 1), and with the number of samples ranging from 100 to 5000. We constructed Bayesian posterior estimates of seroprevalence, finding that, when seroprevalence is 10% or lower, around 1000 samples are necessary to correctly estimate seroprevalence to within ±2% (Figure 2). Marketed tests with other characteristics also required around 1000 tests (Figure 2—figure supplement 1A, B) to achieve the same uncertainty levels, approaching the minimum sample size achieved by a theoretical test with perfect sensitivity and specificity (Figure 2—figure supplement 1C). Similar calculations for other test characteristics may be performed using the open-source tools that accompany this study (Open-source code repository and reproducible notebooks for this manuscript, 2020). In general, estimates were most uncertain when true seropositivity was near 50%, the number of samples was low, and/or test sensitivity/specificity were low. Figure 2 with 1 supplement see all Download asset Open asset Uncertainty of population seroprevalence estimates as a function of number of samples and true population rate. Uncertainty, represented by the width of 95% credible intervals, is presented as ± seroprevalence percentage points in (A) a contour plot and (B) for selected seroprevalence values, based on a serological test with 90% sensitivity and >99.9% specificity (Figure 2—figure supplement 1 depicts results for other sensitivity and specificity values). In total, 5000 samples are sufficient to estimate any seroprevalence to within a worst-case tolerance of ±1.4 percentage points (e.g., 20% ± 1.4% = [18.6%, 21.4%]), even with the imperfect test studied. Each point or pixel is averaged over 250 stochastic draws from the specified seroprevalence with the indicated sensitivity and specificity. Next, we tested the ability of the Bayesian hierarchical model to infer both population and subpopulation seroprevalence. We simulated serological data from subpopulations for which samples were allocated and with heterogeneous seroprevalence levels (Supplementary file 2) and average seroprevalence values between 5% and 50%. Test outcomes were randomly generated conditioning on the false positive and negative properties of the test being modeled (Supplementary file 1). Test allocations across subpopulations were specified in proportion to age demographics of blood donations, delivering mothers, uniformly across subpopulations, or according to an MDI allocation focused on minimized posterior uncertainty in Reff. Credible intervals of the resulting overall seroprevalence estimates were influenced by the age demographics sampled, with the most uncertainty in the newborn dried blood spots sample set, due to the narrow age range for the mothers (Figure 3). For such sampling strategies, which draw from only a subset of the population, our approach assumes that seroprevalence in each subpopulation does not dramatically vary and thus infers that seroprevalence in the unsampled bins is similar to that in the sampled bins but with increased uncertainty. Uncertainty was also influenced by the overall seroprevalence, such that the width of the 95% credible interval increased with higher seroprevalence for a given sample size. While test sensitivity and specificity also impacted uncertainty, central estimates of overall seropositivity were robust for sampling strategies that spanned the entire population. Note that the MDI sample allocation shown in Figure 3 was optimized to estimate the effective reproductive number Reff, and thus, while it performs well, it is slightly outperformed by uniform sampling when used to estimate overall seroprevalence. Figure 3 with 1 supplement see all Download asset Open asset Uncertainty of overall seroprevalence estimates from convenience and formal sampling strategies. Uncertainty, represented by the width of 95% credible intervals, is presented as ± seroprevalence percentage points, based on a serological test with 90% sensitivity and >99.9% specificity (Figure 3—figure supplement 1 depicts results for other sensitivity and specificity values). (A) Curves show the decrease in average CI widths for 15% seroprevalence, illustrating the advantages of using uniform and model and demographics informed (MDI) samples over convenience samples. (B) Contour plots show average CI widths for various total sample counts and overall seroprevalence ranging from 5% to 50%. Convenience samples derived from newborn blood spots (reflecting the age demographics of mothers) or U.S. blood donors improve with additional sampling but retain baseline uncertainty due to demographics not covered by the convenience sample. For the estimation of overall seroprevalence, uniform sampling is marginally superior to this example of the MDI sampling strategy, which was designed to optimize estimation of the effective reproductive number Reff. Each point or pixel is averaged over 250 stochastic draws from the specified seroprevalence with the indicated sensitivity and specificity. Seroprevalence estimates inform uncertainty in epidemic peak, timing, and reproductive number Figure 4 illustrates how the height and timing of peak infections varied in forward simulations under two serological sampling scenarios and two hypothetical social distancing policies for a basic SEIR framework parameterized using seroprevalence data. Uncertainty in seroprevalence estimates propagated through SEIR model outputs in stages: larger sample sizes at a given seroprevalence resulted in a smaller credible interval for the seroprevalence estimate, which improved the precision of estimates of both the height and timing of the epidemic peak. We note that seroprevalence estimates without correction for the sensitivity and specificity of the test resulted in biased estimates in spite of increasing precision with larger sample size (Figure 4C, D). Test characteristics also impacted model estimates, with more specific and sensitive tests leading to more precise estimates (Figure 4—figure supplement 1). Even estimations from a perfect test carried uncertainty corresponding to the size of the sample set (Figure 4—figure supplement 1). Figure 4 with 1 supplement see all Download asset Open asset Uncertainty in serological data produces uncertainty in simulated epidemic peak height and timing. Serological test outcomes for n=100 tests (A; red) and n=1000 tests (B; blue) produce (C, D) posterior seroprevalence estimates with quantified uncertainty with posterior means of 15.2% and 14.6%, respectively; estimates uncorrected for assay performance bias: 13.0% and 13.0%. (E, F) Samples from the seroprevalence posterior produce a distribution of simulated epidemic curves for scenarios of 25% and 50% social distancing (see Materials and methods), leading to uncertainty in (G) epidemic peak and (H) timing, which is mitigated in the n=1000 sample scenario. Boxplot whiskers span 1.5× IQR, boxes span central quartile, lines indicate medians, and outliers were suppressed. se, sensitivity; sp, specificity. Figure 5 illustrates how the Bayesian hierarchical model extrapolates seroprevalence values in sampled subpopulations, based on convenience samples from particular age groups or age-stratified serological surveys, to the overall population, with uncertainty propagated from these estimates to model-inferred epidemiological parameters of interest, such as the effective reproduction number Reff. Estimates from 1000 neonatal heel sticks or blood donations achieved more uncertain, but still reasonable, estimates of overall seroprevalence and Reff as compared to uniform or demographically informed sample sets (Figure 5). Here, convenience samples produced higher confidence estimates in the heavily sampled subpopulations, but high uncertainty estimates in unsampled populations through our Bayesian modeling framework. In all scenarios, our framework propagated uncertainty appropriately from serological inputs to estimates of overall seroprevalence (Figure 5I) or Reff (Figure 5J). Importantly, we note that the inferred posterior estimates shown in Figure 5 are derived from stochastically generated data, meaning that repeating this numerical experiment would produce different simulated test outcomes and therefore different inferred seroprevalence and Reff estimates whose accuracy will stochastically vary, as expected. Improved test sensitivity and specificity correspondingly improved estimation and reduced the number of samples required (i) to achieve the same credible interval for a given seroprevalence and (ii) estimates of Reff (Figure 5—figure supplements 1 and 2). Figure 5 with 4 supplements see all Download asset Open asset Convenience and formal samples provide serological and epidemiological parameter estimates. (A–D) For four sampling strategies, n=1000 tests were allocated to age groups with negative tests (gray outlines) and positive tests (colors) as shown, drawn stochastically based on seroprevalence estimates reflecting SARS-CoV-2 serosurvey outcomes from Geneva, Switzerland, as of May 2020 (Stringhini et al., 2020) for a test with 90% sensitivity and >99.9% specificity. The model and demographics informed (MDI) strategy shown was designed to optimize estimation of Reff. (E–H) Age-group seroprevalence estimates θi are shown as boxplots (boxes 90% CIs, whiskers 95% CIs); dots indicate the true values from which data were sampled (Stringhini et al., 2020). Note the decreased uncertainty for boxes with higher sampling rates. (I) Age-group seroprevalences were weighted by Swiss population demographics to produce overall seroprevalence estimates, shown as probability densities with 95% credible intervals shaded and highlighted with dashed lines. (J) Age-group seroprevalences were used to estimate the effective reproductive number (Reff) from an age-stratified transmission model under status quo ante contact patterns, shown as probability densities with 95% credible intervals shaded and highlighted with dashed lines, based on a basic reproductive number in the absence of population immunity (R0) of 2.5. Dashed lines indicate true values from which the data were sampled. Each distribution depicts inference outcomes from a single set of stochastically sampled data; no averaging is done. Note that although uniform and MDI sample allocation produces equivalently confident estimates of overall seroprevalence, MDI produces a more confident estimate of Reff because it allocates more samples to age groups most relevant to model dynamics. If the subpopulations in the convenience sample have systematically different seroprevalence rates from the general population, increasing the sample size may bias estimates (Figure 5—figure supplements 3 and 4) while simultaneously decreasing the widths of posterior credible intervals, producing higher confidence in estimates in spite of their bias. This may be avoided using data from other sources or by updating the prior distributions in the Bayesian model with known or hypothesized relationships between seroprevalence of the sampled and unsampled populations. In general, the magnitude of this type of bias is not possible to estimate without secondary sources of seroprevalence data, differentiating it from the avoidable biases that result from failing to post-stratify based on population demographics or adjust for the sensitivity and specificity of the test instrument. Strategic sample allocation improves estimates We used the MDI strategy to design a study that optimizes estimation of Reff and then tested the performance of the sample allocations against those resulting from blood donation and neonatal heel stick convenience sampling, as well as uniform sampling. As designed, MDI produced higher confidence posterior estimates (Figure 5J, Figure 5—figure supplement 2). Importantly, because the relative importance of subpopulations in a model varies based on the hypothetical interventions being modeled (e.g., the reopening of workplaces would place higher importance on the serological status of working-age adults), MDI sample allocation recommendations should be derived for multiple hypothetical interventions and then averaged to design a study from which the largest variety of high confidence results can be derived. To illustrate how such recommendations would work in practice, we computed MDI recommendations to optimize three scenarios for the contact patterns and demography of the U.S. and India, deriving a balanced sampling recommendation (Figure 6). Figure 6 Download asset Open asset Model and demographics informed (MDI) sample allocations vary by demographics and modeling needs. Bar charts depict recommended sample allocation for three objectives, reducing posterior uncertainty for (A, E) estimates of overall seroprevalence, (B, F) predictions from an age-structured model with status quo ante contact patterns, (C, G) predictions from an age-structured model with modified contacts representing, relative to pre-crisis levels, a 20% increase in home contact rates, closed schools,

Full Text