Abstract
BackgroundInfluenza epidemics pose significant social and economic challenges in China. Internet search query data have been identified as a valuable source for the detection of emerging influenza epidemics. However, the selection of the search queries and the adoption of prediction methods are crucial challenges when it comes to improving predictions. The purpose of this study was to explore the application of the Support Vector Machine (SVM) regression model in merging search engine query data and traditional influenza data.MethodsThe official monthly reported number of influenza cases in Liaoning province in China was acquired from the China National Scientific Data Center for Public Health from January 2011 to December 2015. Based on Baidu Index, a publicly available search engine database, search queries potentially related to influenza over the corresponding period were identified. An SVM regression model was built to be used for predictions, and the choice of three parameters (C, γ, ε) in the SVM regression model was determined by leave-one-out cross-validation (LOOCV) during the model construction process. The model’s performance was evaluated by the evaluation metrics including Root Mean Square Error, Root Mean Square Percentage Error and Mean Absolute Percentage Error.ResultsIn total, 17 search queries related to influenza were generated through the initial query selection approach and were adopted to construct the SVM regression model, including nine queries in the same month, three queries at a lag of one month, one query at a lag of two months and four queries at a lag of three months. The SVM model performed well when with the parameters (C = 2, γ = 0.005, ɛ = 0.0001), based on the ensemble data integrating the influenza surveillance data and Baidu search query data.ConclusionsThe results demonstrated the feasibility of using internet search engine query data as the complementary data source for influenza surveillance and the efficiency of SVM regression model in tracking the influenza epidemics in Liaoning.
Highlights
Seasonal influenza is a serious public health problem and remains rampant across the world
In order to monitor the infectious diseases activity in time, numerous studies have been emerging recently based on online search query data or social media data, including Google (Seo & Shin, 2017; Yang et al, 2017; Xu et al, 2017; Pollett et al, 2017), Yahoo (Polgreen et al, 2008), Naver (Shin et al, 2016), Daum (Woo et al, 2016; Seo et al, 2014), Baidu search engine (Guo et al, 2017b), Twitter (Wagner et al, 2017; Kagashe, Yan & Suheryani, 2017; Allen et al, 2016; Yun et al, 2016) and Weibo (Fung et al, 2013; Zhang et al, 2015) social media, Wikipedia (Hickmann et al, 2015; McIver & Brownstein, 2014), hospital or clinicians’ database (Bouzille et al, 2018; Santillana et al, 2014), and so on
This article will construct a forecasting model for influenza based on the ensemble data integrating traditional influenza cases data and Baidu search data, which is the most popular search engine in China
Summary
Seasonal influenza is a serious public health problem and remains rampant across the world. The idea of applying internet search query data for the infectious diseases prediction was from Ginsberg et al (2009), who presented a brand-new method providing nearly real-time surveillance of influenza-like illness and overcoming the limitations of lag-time in the traditional flu surveillance systems of the United States. This article will construct a forecasting model for influenza based on the ensemble data integrating traditional influenza cases data and Baidu search data, which is the most popular search engine in China. The SVM model performed well when with the parameters (C = 2, = 0.005, ɛ = 0.0001), based on the ensemble data integrating the influenza surveillance data and Baidu search query data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.