Research has demonstrated the negative impact of racism on health, yet the measurement of racial sentiment remains challenging. This article provides practical guidance on using social media data for measuring public sentiment. We describe the main steps of such research, including data collection, data cleaning, binary sentiment analysis, and visualization of findings. We randomly sampled 55,844,310 publicly available tweets from 1 January 2011 to 31 December 2021 using Twitter's Application Programming Interface. We restricted analyses to US tweets in English using one or more 90 race-related keywords. We used a Support Vector Machine, a supervised machine learning model, for sentiment analysis. The proportion of tweets referencing racially minoritized groups that were negative increased at the county, state, and national levels, with a 16.5% increase at the national level from 2011 to 2021. Tweets referencing Black and Middle Eastern people consistently had the highest proportion of negative sentiment compared with all other groups. Stratifying temporal trends by racial and ethnic groups revealed unique patterns reflecting historical events specific to each group, such as the killing of George Floyd regarding sentiment of posts referencing Black people, discussions of the border crisis near the 2018 midterm elections and anti-Latinx sentiment, and the emergence of COVID-19 and anti-Asian sentiment. This study demonstrates the utility of social media data as a quantitative means to measure racial sentiment over time and place. This approach can be extended to a range of public health topics to investigate how changes in social and cultural norms impact behaviors and policy.A supplemental digital video is available at http://links.lww.com/EDE/C91.
Read full abstract