Abstract
Accurate, high-resolution tracking of influenza epidemics at the regional level helps public health agencies make informed and proactive decisions, especially in the face of outbreaks. Internet users’ online searches offer great potential for the regional tracking of influenza. However, due to the complex data structure and reduced quality of Internet data at the regional level, few established methods provide satisfactory performance. In this article, we propose a novel method named ARGO2 (2-step Augmented Regression with GOogle data) that efficiently combines publicly available Google search data at different resolutions (national and regional) with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) for accurate, real-time regional tracking of influenza. ARGO2 gives very competitive performance across all US regions compared with available Internet-data-based regional influenza tracking methods, and it has achieved 30% error reduction over the best alternative method that we numerically tested for the period of March 2009 to March 2018. ARGO2 is reliable and robust, with the flexibility to incorporate additional information from other sources and resolutions, making it a powerful tool for regional influenza tracking, and potentially for tracking other social, economic, or public health events at the regional or local level.
Highlights
Internet users’ online records contain the footprints of the activities of millions of individuals in nearly every aspect of life, and offer the potential for real-time tracking of public health and social events[1,2,3], including influenza epidemics[4,5], at the regional level[6,7,8]
To effectively address these difficulties, we introduce a novel method ARGO2 that gives accurate and robust real-time %Influenza-like Illness (ILI) estimates at the regional level
We compare our estimates with the actual %ILI subsequently revealed by Centers for Disease Control and Prevention (CDC) weeks later and evaluate the estimation accuracy using multiple metrics, including mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation with the true %ILI
Summary
Internet users’ online records contain the footprints of the activities of millions of individuals in nearly every aspect of life, and offer the potential for real-time tracking of public health and social events[1,2,3], including influenza epidemics[4,5], at the regional level[6,7,8]. Digital flu detection utilizes statistical or mechanistic models to estimate and forecast current and future %ILI at national and/or regional levels based on information from Internet-derived data, such as Google search data, as well as traditional surveillance data, such as CDC’s ILI reports. To effectively address these difficulties, we introduce a novel method ARGO2 that gives accurate and robust real-time %ILI estimates at the regional level. The two-step procedure of ARGO2 has the following features: (i) It automatically selects the most relevant search terms and filters out high-sparsity terms, which overcomes the lower-quality issue found in Google’s regional search data It incorporates (ii) the lower-resolution, national %ILI estimate as the baseline, (iii) the short-term momentum of flu activity, and (iv) cross-regional influence (correlations) to boost estimation accuracy on high-resolution, regional estimation. It incorporates (ii) the lower-resolution, national %ILI estimate as the baseline, (iii) the short-term momentum of flu activity, and (iv) cross-regional influence (correlations) to boost estimation accuracy on high-resolution, regional estimation. (v) It adopts a two-year sliding window for model training, which intends to capture the evolution in people’s search patterns, Google’s search engine, epidemic activity, and other patterns that change over time[30]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.