Abstract

One of the most common uses of social networking data is exploitation in the field of health. In this study, a method for identifying the occurrence of epidemic Influenza using social networking data and comparing it with the official data of health organizations has been investigated to find a threshold Level for diagnosis by data mining methods. Firstly, from 200,000 studied tweets, the tweets associated with the flu virus were extracted by Twitter API in specific locations (10 urban zones in the USA) during specific period (fall 2018) and the database was generated. Features were then extracted from the tweets using natural language processing techniques. After generating the features matrix, a 5-fold cross-validation method was used for training and testing several classifiers. By assigning optimal parameters to various classifiers, it was determined that the support vector machine has the best performance compared to other methods. Finally, by calculating the average of all ratios of the number of tweets representing the disease extracted from support vector machine (SVM) to all Influenza-related tweets in a specific geographical locations and Time period, and adapting the data to the areas reported as epidemic Influenza by the Center of Disease Control (CDC), the threshold level of 5.1% was obtained as the ratio of the number of tweets indicating the spreading of Influenza virus in different geographical locations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call