Abstract

Data gathered from social media have been used extensively to examine lexical dialect variation in widely used languages such as English and Spanish, but their use to date in morphosyntax and for lesser-used languages has been more limited. This paper tests the usefulness of using data derived from Twitter to address traditional questions in dialect syntax and sociolinguistics. It uses two cases studies from Welsh – the form of the second-person singular pronoun in various syntactic contexts, and the availability of auxiliary deletion – to assess whether datasets based on Twitter data can successfully replicate and enhance results derived by traditional means. The results of the case studies coincide to a large extent with distributions established in existing studies, even ones using entirely different methods, such as dialect questionnaires or acceptability judgment tests. Twitter data also show considerable success in establishing implicational hierarchies and conditioning factors comparable to those typical of the field. Where the results differ from existing studies, the differences may be due to the younger demographics of Twitter users, or to differences in the quantity of data provided by different methodologies. The results produce patterns closer to spoken data than to written data, giving us reasonable confidence in such data as a relatively good proxy for spoken usage of large numbers of language users.

Highlights

  • While data from social-media platforms such as Twitter and Facebook have been used by linguists to investigate lexical variation (Russ 2012, Gonçalves & Sánchez 2014 etc.) and change (Grieve, Nini & Guo 2016; 2018), use of such material for morphosyntax has been relatively limited to date

  • This paper addresses some of these issues by considering two cases of morphosyntactic dialect variation in Welsh, a language with a relatively small presence in social media

  • Welsh is regularly used in social media: Kevin Scannell reports over 14,000 Twitter users as tweeting in Welsh with some 5.7 million tweets having been composed in Welsh, and the number has likely risen significantly since these figures were last updated in 2014

Read more

Summary

Introduction

While data from social-media platforms such as Twitter and Facebook have been used by linguists to investigate lexical variation (Russ 2012, Gonçalves & Sánchez 2014 etc.) and change (Grieve, Nini & Guo 2016; 2018), use of such material for morphosyntax has been relatively limited to date. Willis: Using social-media data to investigate morphosyntactic variation and dialect syntax in a lesser-used language tweets in Spanish to map lexical variation across the Spanish-speaking world. Other studies in this tradition include Scheffler et al (2014), Gonçalves & Sánchez (2016), Huang et al (2016), Donoso & Sánchez (2017), Eisenstein (2017), Shoemark, Kirby & Goldwater (2017) and Grieve et al (2019)

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call