Abstract

The performance of data-driven models depends on training samples. For accurately predicting dengue fever cases, historical incidence data are inadequate in many locations. This work aims to enhance temporally limited dengue case data by methodological addition of epidemically relevant case data from nearby locations as predictors (features). A novel framework is presented for windowing incidence data and computing time-shifted correlation-based metrics to quantify feature relevance. The framework ranks incidence data of adjacent locations around a target by combining metrics based on correlation, spatial distance, and local prevalence. Recurrent neural network models achieve up to 33.6% accuracy improvement on average using the proposed method. These models achieve mean absolute error (MAE) values as low as 0.128 on [0,1] normalized incidence data for a municipality with the highest dengue prevalence in Brazil’s Espirito Santo. When predicting aggregate cases over geographical ecoregions, the models improve by 16.5%, using only 6.5% of ranked incidence data. This paper also presents two correlation window allocation methods: fixed-size and outbreak detection. Both perform comparably well, although the outbreak detection method uses less data for computations. The proposed framework is generalized, and it can be used to improve time-series predictions of many spatiotemporal datasets.

Highlights

  • A CCURATE time series prediction of dengue fever outbreaks can be useful in planning mitigation strategies for hundreds of tropical and subtropical regions around the world

  • To determine if a P ICk is leading in a window, we find the location (θ) of the peak correlation as shown in (4)

  • In this work, we develop a method to select relevant incidence data from peripheral locations as features to improve the prediction of dengue fever outbreaks

Read more

Summary

Introduction

A CCURATE time series prediction of dengue fever outbreaks can be useful in planning mitigation strategies for hundreds of tropical and subtropical regions around the world. The prediction accuracy of such models depends on the quality and the quantity of training data. The availability of incidence data varies across regions. A data aggregation center in an area may not have adequate data to achieve an acceptable level of accuracy in out-of-sample projections. In such cases, selected incidence data from adjacent centers in the same region could improve model performance as additional features. We propose a framework with quantitative methodologies to rank and select nearby case data as supplementary features

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call