Abstract

As a pedagogical demonstration of Twitter data analysis, a case study of HIV/AIDS-related tweets around World AIDS Day, 2014, was presented. This study examined if Twitter users from countries with various income levels responded differently to World AIDS Day. The performance of support vector machine (SVM) models as classifiers of relevant tweets was evaluated. A manual coding of 1,826 randomly sampled HIV/AIDS-related original tweets from November 30 through December 2, 2014 was completed. Logistic regression was applied to analyze the association between the World Bank-designated income level of users’ self-reported countries and Twitter contents. To identify the optimal SVM model, 1278 (70%) of the 1826 sampled tweets were randomly selected as the training set, and 548 (30%) served as the test set. Another 180 tweets were separately sampled and coded as the held-out dataset. Compared with tweets from low-income countries, tweets from the Organization for Economic Cooperation and Development countries had 60% lower odds to mention epidemiology (adjusted odds ratio, aOR = 0.404; 95% CI: 0.166, 0.981) and three times the odds to mention compassion/support (aOR = 3.080; 95% CI: 1.179, 8.047). Tweets from lower-middle-income countries had 79% lower odds than tweets from low-income countries to mention HIV-affected sub-populations (aOR = 0.213; 95% CI: 0.068, 0.664). The optimal SVM model was able to identify relevant tweets from the held-out dataset of 180 tweets with an accuracy (F1 score) of 0.72. This study demonstrated how students can be taught to analyze Twitter data using manual coding, regression models, and SVM models.

Highlights

  • 36.9 million people were living with human immunodeficiency virus (HIV) and 2 million people became newly infected with HIV in 2014 [1]

  • Compared with tweets from low income countries, tweets from Organization for Economic Cooperation and Development (OECD) countries had 3.08 times the odds of mentioning HIV/AIDS compassion and support (AOR = 3.08, 95% CI = 1.18, 8.05) after controlling for mentions of World AIDS Day (WAD), HIV/AIDS epidemiology, and HIV/AIDS testing

  • The trained support vector machine (SVM) model was able to predict the dataset with 77% sensitivity, a positive predictive score of 68%, and an F1 score of 0.72

Read more

Summary

Introduction

36.9 million people were living with human immunodeficiency virus (HIV) and 2 million people became newly infected with HIV in 2014 [1]. Data 2019, 4, 84 people in the United States were estimated to be living with HIV in 2015, of whom approximately 15%. Prevention of HIV infection and associated illnesses and deaths is one of the goals of the Healthy People 2020 initiatives in the United States [3]. Social media, such as Twitter and Facebook, has become increasingly popular as a data source and a tool in public health for both epidemiologic surveillance and communication surveillance [4]. Many public health agencies use social media to promote healthy lifestyles and disease prevention

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.