Abstract

BackgroundTraditional surveillance systems produce estimates of influenza-like illness (ILI) incidence rates, but with 1- to 3-week delay. Accurate real-time monitoring systems for influenza outbreaks could be useful for making public health decisions. Several studies have investigated the possibility of using internet users’ activity data and different statistical models to predict influenza epidemics in near real time. However, very few studies have investigated hospital big data.ObjectiveHere, we compared internet and electronic health records (EHRs) data and different statistical models to identify the best approach (data type and statistical model) for ILI estimates in real time.MethodsWe used Google data for internet data and the clinical data warehouse eHOP, which included all EHRs from Rennes University Hospital (France), for hospital data. We compared 3 statistical models—random forest, elastic net, and support vector machine (SVM).ResultsFor national ILI incidence rate, the best correlation was 0.98 and the mean squared error (MSE) was 866 obtained with hospital data and the SVM model. For the Brittany region, the best correlation was 0.923 and MSE was 2364 obtained with hospital data and the SVM model.ConclusionsWe found that EHR data together with historical epidemiological information (French Sentinelles network) allowed for accurately predicting ILI incidence rates for the entire France as well as for the Brittany region and outperformed the internet data whatever was the statistical model used. Moreover, the performance of the two statistical models, elastic net and SVM, was comparable.

Highlights

  • BackgroundInfluenza is a major public health problem

  • We aim to find the best statistical model to estimate influenza incidence rates at the national and regional scales by using hospital big data (HBD) or internet data. As these models have been described in the literature, we focused on two machine learning algorithms, random forest (RF) and support vector machine (SVM), and a linear regression model, elastic net

  • We show the results we obtained with the four datasets and three models—RF, SVM, and elastic net+residuals fitted by autoregressive integrated moving average (ARIMA) (ElasticNet+ARIMA)

Read more

Summary

Introduction

BackgroundInfluenza is a major public health problem. Outbreaks cause up to 5 million severe cases and 500,000 deaths per year worldwide [1,2,3,4,5]. ILI surveillance networks produce estimates of ILI incidence rates, but with a 1- to 3-week delay due to the time needed for data processing and aggregation. This time lag is an issue for public health decision making [2,7]. Several studies have investigated the possibility of using internet users’ activity data and different statistical models to predict influenza epidemics in near real time. Conclusions: We found that EHR data together with historical epidemiological information (French Sentinelles network) allowed for accurately predicting ILI incidence rates for the entire France as well as for the Brittany region and outperformed the internet data whatever was the statistical model used. The performance of the two statistical models, elastic net and SVM, was comparable

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call