Abstract

The coronavirus disease 2019 (COVID-19) spread rapidly across the world since its appearance in December 2019. This data set creates one-, three-, and seven-day forecasts of the COVID-19 pandemic's cumulative case counts at the county, health district, and state geographic levels for the state of Virginia. Forecasts are created over the first 46 days of reported COVID-19 cases using the cumulative case count data provided by The New York Times as of April 22, 2020. From this historical data, one-, three-, seven, and all-days prior to the forecast start date are used to generate the forecasts. Forecasts are created using: (1) a Naïve approach; (2) Holt-Winters exponential smoothing (HW); (3) growth rate (Growth); (4) moving average (MA); (5) autoregressive (AR); (6) autoregressive moving average (ARMA); and (7) autoregressive integrated moving average (ARIMA). Median Absolute Error (MdAE) and Median Absolute Percentage Error (MdAPE) metrics are created with each forecast to evaluate the forecast with respect to existing historical data. These error metrics are aggregated to provide a means for assessing which combination of forecast method, forecast length, and lookback length are best fits, based on lowest aggregated error at each geographic level.The data set is comprised of an R-Project file, four R source code files, all 1,329,404 generated short-range forecasts, MdAE and MdAPE error metric data for each forecast, copies of the input files, and the generated comparison tables. All code and data files are provided to provide transparency and facilitate replicability and reproducibility. This package opens directly in RStudio through the R Project file. The R Project file removes the need to set path locations for the folders contained within the data set to simplify setup requirements. This data set provides two avenues for reproducing results: 1) Use the provided code to generate the forecasts from scratch and then run the analyses; or 2) Load the saved forecast data and run the analyses on the stored data. Code annotations provide the instructions needed to accomplish both routes.This data can be used to generate the same set of forecasts and error metrics for any US state by altering the state parameter within the source code. Users can also generate health district forecasts for any other state, by providing a file which maps each county within a state to its respective health-district. The source code can be connected to the most up-to-date version of The New York Times COVID-19 dataset allows for the generation of forecasts up to the most recently reported data to facilitate near real-time forecasting.

Highlights

  • Application of One, Three, and Seven-Day Forecasts During Early Onset on the COVID-19 Epidemic Dataset Using Moving Average, Autoregressive, Autoregressive Moving Average, Autoregressive Integrated Moving Average, and Naïve Forecasting Methods

  • Median Absolute Error (MdAE) and Median Absolute Percentage Error (MdAPE) metrics are created with each forecast to evaluate the forecast with respect to existing historical data

  • The data set is comprised of an R-Project file, four R source code files, all 1,329,404 generated short-range forecasts, MdAE and MdAPE error metric data for each forecast, copies of the input files, and the generated comparison tables

Read more

Summary

Coronavirus Mendeley data package folder

All of the calls to generate the Naïve, Growth, HW, MA(1), AR(1), ARMA(1,1), and ARIMA(p,d,q) forecasts occur within this file; the functions for generating these forecasts appear within the following files This code loads the stored copy of The New York Times cumulative case count data [5] as of April 23, 2020. A mapping of the VA county names to their corresponding health districts is loaded and formatted into time series data objects This file contains a majority of the functions needed to generate the county level forecasts. This option is provided as the generation of the forecasts is time consuming and resource intensive This file contains the functions needed to generate the health district and state level forecasts.

Data folder
1.10. Figures folder
Ethics Statement
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call