A Geo-Tagged COVID-19 Twitter Dataset for 10 North American Metropolitan Areas over a 255-Day Period

Sara Melotte,Mayank Kejriwal

doi:10.3390/data6060064

Abstract

One of the unfortunate findings from the ongoing COVID-19 crisis is the disproportionate impact the crisis has had on people and communities who were already socioeconomically disadvantaged. It has, however, been difficult to study this issue at scale and in greater detail using social media platforms like Twitter. Several COVID-19 Twitter datasets have been released, but they have very broad scope, both topically and geographically. In this paper, we present a more controlled and compact dataset that can be used to answer a range of potential research questions (especially pertaining to computational social science) without requiring extensive preprocessing or tweet-hydration from the earlier datasets. The proposed dataset comprises tens of thousands of geotagged (and in many cases, reverse-geocoded) tweets originally collected over a 255-day period in 2020 over 10 metropolitan areas in North America. Since there are socioeconomic disparities within these cities (sometimes to an extreme extent, as witnessed in ‘inner city neighborhoods’ in some of these cities), the dataset can be used to assess such socioeconomic disparities from a social media lens, in addition to comparing and contrasting behavior across cities.

Highlights

Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Suite 1001, Abstract: One of the unfortunate findings from the ongoing COVID-19 crisis is the disproportionate impact the crisis has had on people and communities who were already socioeconomically disadvantaged
It has been difficult to study this issue in greater detail using social media sources like Twitter
What is missing is a carefully controlled dataset that would enable computational social scientists in specific contexts to study the issue from a social media lens without much hassle

Summary

In addition to its medical consequences, the ongoing COVID-19 crisis has revealed (if not exacerbated) deep inequalities in our society [1,2,3,4]. GeoCOV19Tweets dataset, originally obtained by filtering English tweets from the Twitter streaming API by using a continuously updated, expansive list of keywords and hashtags [7]. Our primary goal in publishing this dataset is to enable social scientists and digital humanities scholars with a less technical background to study COVID-19 in metropolitan contexts, over a longitudinal period, through a social media lens. For this reason, our dataset is compact and places a high premium on accurate geotagging, the details of which are described subsequently

Data Description

Preliminaries

Hydrating Tweets

Determining Tweet Origin

Reverse-Geocoding

Selecting Metropolitan Areas

Location-Based Filtering

Related Datasets

Ethical Considerations

Possible Compliance with FAIR

Statistical Summary

Statistics on Sentiment Scores

Statistics on Hashtags

Findings

Possible Use-Cases

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data	Publication Date: Jun 16, 2021
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Geo-Tagged COVID-19 Twitter Dataset for 10 North American Metropolitan Areas over a 255-Day Period

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data

Lead the way for us

Similar Papers

The growth of social media in science: Social media has evolved from a mere communication channel to an integral tool for discussion and research collaboration.
Philip Hunter
EMBO Reports | VOL. 21
Philip HunterPhilip Hunter
23 Apr 2020
EMBO Reports | VOL. 21

Twitter Archives and the Challenges of "Big Social Data" for Media and Communication Research
Jean Burgess ... Axel Bruns
M/C Journal | VOL. 15
Jean Burgess, et. al.Jean Burgess ... Axel Bruns
11 Oct 2012
M/C Journal | VOL. 15

Going Viral: The 3 Rs of Social Media Messaging during Public Health Emergencies.
Bhavini Patel Murthy ... Tanya Telfair Leblanc
Health Security | VOL. 19
Bhavini Patel Murthy, et. al.Bhavini Patel Murthy ... Tanya Telfair Leblanc
01 Feb 2021
Health Security | VOL. 19

The Influence of B to B Firms Use of Multiple Social Media Platforms on Relationship Sales Performance: An Institutional Perspective
Kaouther Kooli ... Mohamad Yassine Hammouda
Journal of Business-to-Business Marketing | VOL. 28
Kaouther Kooli, et. al.Kaouther Kooli ... Mohamad Yassine Hammouda
03 Apr 2021
Journal of Business-to-Business Marketing | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Geo-Tagged COVID-19 Twitter Dataset for 10 North American Metropolitan Areas over a 255-Day Period

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data