Abstract

Social media sites are considered one of the most important sources of data in many fields, such as health, education, and politics. While surveys provide explicit answers to specific questions, posts in social media have the same answers implicitly occurring in the text. This research aims to develop a method for extracting implicit answers from large tweet collections, and to demonstrate this method for an important concern: the problem of heart attacks. The approach is to collect tweets containing “heart attack” and then select from those the ones with useful information. Informational tweets are those which express real heart attack issues, e.g., “Yesterday morning, my grandfather had a heart attack while he was walking around the garden.” On the other hand, there are non-informational tweets such as “Dropped my iPhone for the first time and almost had a heart attack.” The starting point was to manually classify around 7000 tweets as either informational (11%) or non-informational (89%), thus yielding a labeled dataset to use in devising a machine learning classifier that can be applied to our large collection of over 20 million tweets. Tweets were cleaned and converted to a vector representation, suitable to be fed into different machine-learning algorithms: Deep neural networks, support vector machine (SVM), J48 decision tree and naïve Bayes. Our experimentation aimed to find the best algorithm to use to build a high-quality classifier. This involved splitting the labeled dataset, with 2/3 used to train the classifier and 1/3 used for evaluation besides cross-validation methods. The deep neural network (DNN) classifier obtained the highest accuracy (95.2%). In addition, it obtained the highest F1-scores with (73.6%) and (97.4%) for informational and non-informational classes, respectively.

Highlights

  • To help identify the heart attack problem from Twitter data, we define two types of tweets: informational and non-informational tweets. Informational tweets are those which express real heart attack issues, such as “Yesterday morning, my grandfather had a heart attack while he was walking around the garden”

  • The goal behind this research is to classify tweets into two categories informational and non-informational. Informational tweets are those which express real heart attack issues, while non-informational ones are not related to heart attacks

  • The highest accuracy (95.2%) was obtained using deep neural network (DNN), and the highest F1-scores (73.6%) and (97.4%) were obtained by DNN for informational and non-informational classes, respectively

Read more

Summary

Introduction

Social media networks like Facebook, Myspace and Twitter, are considered one of the most important methods of communication among people [1]. Social media has been developed during the recent decade to form an important tool to gather information and build solutions in several fields such as business, entertainment and crisis management in health care, science and politics [2]. Social media are considered one of the significant sources for extracting information related to health monitoring [3,4]. In this research project, we are interested in collecting information related to heart attack problems from social media sites. Many people use social media to share information related to Future Internet 2021, 13, 19.

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.