Abstract

BackgroundTwitter provides various types of location data, including exact Global Positioning System (GPS) coordinates, which could be used for infoveillance and infodemiology (ie, the study and monitoring of online health information), health communication, and interventions. Despite its potential, Twitter location information is not well understood or well documented, limiting its public health utility.ObjectiveThe objective of this study was to document and describe the various types of location information available in Twitter. The different types of location data that can be ascertained from Twitter users are described. This information is key to informing future research on the availability, usability, and limitations of such location data.MethodsLocation data was gathered directly from Twitter using its application programming interface (API). The maximum tweets allowed by Twitter were gathered (1% of the total tweets) over 2 separate weeks in October and November 2011. The final dataset consisted of 23.8 million tweets from 9.5 million unique users. Frequencies for each of the location options were calculated to determine the prevalence of the various location data options by region of the world, time zone, and state within the United States. Data from the US Census Bureau were also compiled to determine population proportions in each state, and Pearson correlation coefficients were used to compare each state’s population with the number of Twitter users who enable the GPS location option.ResultsThe GPS location data could be ascertained for 2.02% of tweets and 2.70% of unique users. Using a simple text-matching approach, 17.13% of user profiles in the 4 continental US time zones were able to be used to determine the user’s city and state. Agreement between GPS data and data from the text-matching approach was high (87.69%). Furthermore, there was a significant correlation between the number of Twitter users per state and the 2010 US Census state populations (r ≥ 0.97, P < .001).ConclusionsHealth researchers exploring ways to use Twitter data for disease surveillance should be aware that the majority of tweets are not currently associated with an identifiable geographic location. Location can be identified for approximately 4 times the number of tweets using a straightforward text-matching process compared to using the GPS location information available in Twitter. Given the strong correlation between both data gathering methods, future research may consider using more qualitative approaches with higher yields, such as text mining, to acquire information about Twitter users’ geographical location.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call