Abstract

The movements of ideas and content between locations and languages are unquestionably crucial concerns to researchers of the information age, and Twitter has emerged as a central, global platform on which hundreds of millions of people share knowledge and information. A variety of research has attempted to harvest locational and linguistic metadata from tweets to understand important questions related to the 300 million tweets that flow through the platform each day. Much of this work is carried out with only limited understandings of how best to work with the spatial and linguistic contexts in which the information was produced, however. Furthermore, standard, well-accepted practices have yet to emerge. As such, this article studies the reliability of key methods used to determine language and location of content in Twitter. It compares three automated language identification packages to Twitter's user interface language setting and to a human coding of languages to identify common sources of disagreement. The article also demonstrates that in many cases user-entered profile locations differ from the physical locations from which users are actually tweeting. As such, these open-ended, user-generated profile locations cannot be used as useful proxies for the physical locations from which information is published to Twitter.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.