The goal of this study is to propose and test a scalable framework for machine learning (ML) algorithms to predict near-term severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cases by incorporating and evaluating the impact of real-time dynamic public health data. Data used in this study include patient-level results, procurement, and location information of all SARS-CoV-2 tests reported in West Virginia as part of their mandatory reporting system from January 2021 to March 2022. We propose a method for incorporating and comparing widely available public health metrics inside of a ML framework, specifically a long-short-term memory network, to forecast SARS-CoV-2 cases across various feature sets. Our approach provides better prediction of localized case counts and indicates the impact of the dynamic elements of the pandemic on predictions, such as the influence of the mixture of viral variants in the population and variable testing and vaccination rates during various eras of the pandemic. Utilizing real-time public health metrics, including estimated Rt from multiple SARS-CoV-2 variants, vaccination rates, and testing information, provided a significant increase in the accuracy of the model during the Omicron and Delta period, thus providing more precise forecasting of daily case counts at the county level. This work provides insights on the influence of various features on predictive performance in rural and non-rural areas. Our proposed framework incorporates available public health metrics with operational data on the impact of testing, vaccination, and current viral variant mixtures in the population to provide a foundation for combining dynamic public health metrics and ML models to deliver forecasting and insights in healthcare domains. It also shows the importance of developing and deploying ML frameworks in rural settings.
Read full abstract