Epidemic waves of coronavirus disease 2019 (COVID-19) infections have often been associated with the emergence of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants. Rapid detection of growing genomic variants can therefore serve as a predictor of future waves, enabling timely implementation of countermeasures such as non-pharmaceutical interventions (social distancing), additional vaccination (booster campaigns), or healthcare capacity adjustments. The large amount of SARS-CoV-2 genomic sequence data produced during the pandemic has provided a unique opportunity to explore the utility of these data for generating early warning signals (EWS). We developed an analytical pipeline (Transmission Fitness Polymorphism Scanner - designated in an R package mrc-ide/tfpscanner) for systematically exploring all clades within a SARS-CoV-2 virus phylogeny to detect variants showing unusually high growth rates. We investigated the use of these cluster growth rates as the basis for a variety of statistical time series to use as leading indicators for the epidemic waves in the UK during the pandemic between August 2020 and March 2022. We also compared the performance of these phylogeny-derived leading indicators with a range of non-phylogeny-derived leading indicators. Our experiments simulated data generation and real-time analysis. Using phylogenomic analysis, we identified leading indicators that would have generated EWS ahead of significant increases in COVID-19 hospitalisations in the UK between August 2020 and March 2022. Our results also show that EWS lead time is sensitive to the threshold set for the number of false positive (FP) EWS. It is often possible to generate longer EWS lead times if more FP EWS are tolerated. On the basis of maximising lead time and minimising the number of FP EWS, the best performing leading indicators that we identified, amongst a set of 1.4 million, were the maximum logistic growth rate (LGR) amongst clusters of the dominant Pango lineage and the mean simple LGR across a broader set of clusters. In the case of the former, the time between the EWS and wave inflection points (a conservative measure of wave start dates) for the seven waves ranged between a 20-day lead time and a 7-day lag, with a mean lead time of 5.4 days. The maximum number of FP EWS generated prior to a true positive (TP) EWS was two and this only occurred for two of the seven waves in the period. The mean simple LGR amongst a broader set of clusters also performed well in terms of lead time but with slightly more FP EWS. As a result of the significant surveillance effort during the pandemic, early detection of SARS-CoV-2 variants of concern Alpha, Delta, and Omicron provided some of the first examples where timely detection and characterisation of pathogen variants has been used to tailor public health response. The success of our method in generating early warning signals based on phylogenomic analysis for SARS-CoV-2 in the UK may make it a worthwhile addition to existing surveillance strategies. In addition, the method may be translatable to other countries and/or regions, and to other pathogens with large-scale and rapid genomic surveillance. This research was funded in whole, or in part, by the Wellcome Trust (220885_Z_20_Z). For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. KOD, OB, VBF and EMV acknowledge funding from the MRC Centre for Global Infectious Disease Analysis (reference MR/X020258/1), jointly funded by the UK Medical Research Council (MRC) and the UK Foreign, Commonwealth & Development Office (FCDO), under the MRC/FCDO Concordat agreement and is also part of the EDCTP2 programme supported by the European Union. RMC acknowledges funding from the Wellcome Trust Collaborators Award (206298/Z/17/Z).
Read full abstract