Abstract
AbstractTwo predominant methodologies in forecasting temporal processes include traditional time series models and machine learning methods. This paper investigates the impact of time series cross‐validation (TSCV) on both approaches in the context of a case study predicting the incidence of COVID‐19 based on wastewater data. The TSCV framework outlined in the paper begins by engineering interpretable features hypothesized as potential predictors of COVID‐19 incidence. Feature selection and hyperparameter tuning are then utilized with TSCV to identify the best features and hyperparameters for optimal model performance given a specific forecast horizon. While evidence supporting the utility of TSCV for auto‐regressive integrated moving average model with exogenous variables (TS‐ARIMAX) forecasts is lacking in this study, such an approach proves advantageous for gradient boosting machine forecasts (TS‐GBM). In Wyoming, for instance, TS‐GBM had a 34.9% improvement compared to naïve predictions, whereas GBM without TSCV only had a 15.6% improvement. However, TSCV also enhances interpretability for both TS‐ARIMAX and TS‐GBM models as this approach selects specific features, such as lagged values of COVID‐19 cases, based on forecast performance and forecast length. Future research should work to explore the influence of stationarity and model averaging on the performance of TSCV in forecasting applications.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.