Learning from the COVID-19 pandemic is vital for future global health crises. Forecasting pandemic spread across countries in the presence of viral mutations, such as SARS-CoV-2 variant strains, remains a challenge. Previous studies indicate spatial and temporal regularities in SARS-CoV-2 variant distribution, yet limitations persist: (1) static geographical patterns overlook dynamic changes and their causes, and (2) the association between temporal variations in SARS-CoV-2 lineages and epidemic outbreaks is underexplored. To address these gaps, we propose an analysis framework using more than 10 million SARS-CoV-2 genome sequence metadata from 100 countries. Our framework identifies spatial heterogeneity patterns in the relative frequency of SARS-CoV-2 variants through clustering, examines factors influencing spatial heterogeneity using explainable machine learning, and analyzes the time lag effect of temporal variation on COVID-19 infection waves based on spatial patterns. Our findings demonstrate the following: (1) The distribution of SARS-CoV-2 variant strains among 100 countries exhibits a spatial pattern characterized by geographic proximity, with observed dynamic changes. (2) Bilateral distance consistently dominates, and factors related to globalization and response policy contribute to nongeographic proximity and temporal variation of the spatial pattern. (3) Leveraging this spatial pattern allows for the forecast of time lags (mainly ranging from twenty to thirty-eight days) between the emergence of new strains and subsequent infection waves. Our study clarifies the spatial pattern and time lag effect of the dynamic distribution of SARS-CoV-2 variant strains. It offers a spatial basis and temporal reference for early warning in future global public health crises from a geographical perspective.
Read full abstract