Abstract

ObjectiveTo investigate whether Twitter data can be used as a proxy for the surveillance of the seasonal influenza epidemic in France and at the regional level.IntroductionSocial media as Twitter are used today by people to disseminate health information but also to share or exchange on their health. Based on this observation, recent studies showed that Twitter data can be used to monitor trends of infectious diseases such as influenza. These studies were mainly carried out in United States where Twitter is very popular1-4. In our knowledge, no research has been implemented in France to know whether Twitter data can be a complementary data source to monitor seasonal influenza epidemic.MethodsFor this exploratory study, an R program allowing to the collection, pre-processing (geolocation and classification) and analysis of Tweets related to influenza-like illness was developed.CollectionStream API was used to collect Tweets in French language that contained terms “grippe”,”grippal”, “grippaux” without to specify geolocation coordinates.Pre-processIn order to identify Tweets localized in France, a combination of automated filters has been implemented. At the end, were retained:● Tweets with geolocation coordinates in France (GPS coordinates, country code, country, place name)● Tweets whose place indicated in user’s profile matched with a city, department or region of France● Tweets included FR-related time zone but excluding all Tweets reporting a FR time zone but a non-FR place-code.In the second time, a support vector machine (SVM) classifier was used to filter out noise from the database. To train the classifier, 1500 Tweets were randomly sampled. Each of these 1500 training Tweets was manually inspected and tagged as valid or invalid according to the likelihood that they indicated influenza-like illness. This hand-tagged training set was converted to vector representation using their term-frequency-inverse document frequency (TF-IDF) scores. These TF-IDF vectors were then input to the SVM for training. To evaluate performances of the classifier: accurency, recall and F- measure were calculated from a 1000 randomly sampled Tweets manually tagged.AnalysisData collected over the period from August 8, 2016 to March 26, 2017 were compared to those of the French syndromic surveillance system SurSaUD® (OSCOUR® and SOS Médecins network)5 by Spearman's rank correlation coefficient.EthicalIn accordance to the National Commission on Informatics and Liberty, information about user account were removed in database except location variables. Usernames contained in the text of the tweet have also been deleted.ResultsOver the study period, the system collected 238,244 influenza-related Tweets of which 130,559 were located in France. After a cleaning step, 22,939 Tweets were classified by the algorithm as an influenza-like illness (ILI). The performances of the classifier were 0.739 for accuracy, 0.725 for recall and 0.732 for F-measure. Figure 1 shows that the weekly number of ILI Tweets follows the same trend as the weekly number of ED visits and physicians consultations for ILI. Regardless of data source, Spearman's correlation coefficients were positive and statistically significant at the national level and for each region of France (Table 1).ConclusionsThis exploratory study allowed to show that Twitter data can be used to monitor the epidemic of seasonal influenza in France and at regional level, in complementarity with existing systems. The system needs to be improved to confirm the trends observed during the next influenza epidemic.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.