Abstract
IntroductionEarly reports of COVID-19 cases and deaths may not accurately convey community-level concern about the pandemic during early stages, particularly in the United States where testing capacity was initially limited. Social media interaction may elucidate public reaction and communication dynamics about COVID-19 in this critical period, during which communities may have formulated initial conceptions about the perceived severity of the pandemic.MethodsTweets were collected from the Twitter public API stream filtered for keywords related to COVID-19. Using a pre-existing training set, a support vector machine (SVM) classifier was used to obtain a larger set of geocoded tweets with characteristics of user self-reporting COVID-19 symptoms, concerns, and experiences. We then assessed the longitudinal relationship between identified tweets and the number of officially reported COVID-19 cases using linear and exponential regression at the U.S. county level. Changes in tweets that included geospatial clustering were also assessed for the top five most populous U.S. cities.ResultsFrom an initial dataset of 60 million tweets, we analyzed 459,937 tweets that contained COVID-19-related keywords that were also geolocated to U.S. counties. We observed an increasing number of tweets throughout the study period, although there was variation between city centers and residential areas. Tweets identified as COVID-19 symptoms or concerns appeared to be more predictive of active COVID-19 cases as temporal distance increased.ConclusionResults from this study suggest that social media communication dynamics during the early stages of a global pandemic may exhibit a number of geospatial-specific variations among different communities and that targeted pandemic communication is warranted. User engagement on COVID-19 topics may also be predictive of future confirmed case counts, though further studies to validate these findings are needed.
Highlights
Reports of COVID-19 cases and deaths may not accurately convey community-level concern about the pandemic during early stages, in the United States where testing capacity was initially limited
In mid-March 2020, approximately 150,000 cases of coronavirus 2019 (COVID-19) had been confirmed globally, with only about 2000 of these cases occurring at the time in the United States [1]
These were “corona outbreak,” “corona,” “anticorona,” “coronavirus,” “Wuhan virus,” “COVID,” “Wuhan pneumonia,” and “pneumonia of unknown cause.” these keywords were chosen on the basis of structured manual searches conducted on twitter that detected content related to the COVID-19 outbreak as posted by users, and they have been validated as being able to identify tweets pertaining to COVID-19 conversations in prior studies [29, 30]
Summary
Reports of COVID-19 cases and deaths may not accurately convey community-level concern about the pandemic during early stages, in the United States where testing capacity was initially limited. It was recommended that testing be reserved for individuals suffering relatively severe symptoms requiring hospitalization [8] This may have resulted in an early outbreak period that exhibited inaccurate spatial variation for pandemic-related concern due to underreporting of true case count estimations. One approach to assessing underreporting of COVID-19 symptomatic individuals and possible cases is by using “infoveillance” approaches, including using Internet and social media data to identify the distribution and determinants of disease-related concern, such as self-reporting of COVID-19 symptoms and lack of access to testing [9,10,11]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.