Abstract

Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity – especially with regard to race – plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified “netspeak” dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.

Highlights

  • An increasing proportion of informal communication is conducted in written form, mediated by technology such as smartphones and social media platforms

  • While geographical distance is prominent, the absolute difference in the proportion of African Americans is the strongest predictor: the more similar two metropolitan areas are in terms of this demographic, the more likely that linguistic influence is transmitted between them

  • By tracking the popularity of words over time and space, we can harness largescale data to uncover the hidden structure of language change

Read more

Summary

Introduction

An increasing proportion of informal communication is conducted in written form, mediated by technology such as smartphones and social media platforms. In our corpus of social media text from 2009 to 2012, the abbreviation ikr (I know, right?) occurs six times more frequently in the Detroit area than in the United States overall; the emoticon ^{ ^ occurs four times more frequently in Southern California; the phonetic spelling suttin (something) occurs five times more frequently in New York City. These differences raise questions about how language change spreads in online communication. We seek explanations for this network in a set of demographic and geographic predictors, using a logistic regression in which these predictors are used to explain the induced transmission pathways

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call