Articulation Rate in American English in a Corpus of YouTube Videos.

Steven Coats

doi:10.1177/0023830919894720

Abstract

Previous studies of the temporal organization of speech in American English have found differences in speaking or articulation rate according to speaker dialect or location, but small sample sizes and incomplete geographic coverage have limited the generalizability of the findings. In this study, articulation rates in American English are calculated from the automatic speech-to-text transcripts of more than 29,000 hours of video from local government and civic organization channels on YouTube from the 48 contiguous US states, containing more than 230 million individual word timings. Two questions are considered: are there regional differences in articulation rate? And do urban speakers articulate faster than rural speakers? The study presents several methodological innovations: first, it identifies a genre of regional speech suitable for interregional comparisons (meetings of local governments or civic organizations). Second, it introduces a new method for the calculation of articulation rate using cue and word timestamps from caption files. Third, it leverages US Census data to correlate the articulation rate with population for a large number of localities. The study shows that, in line with previous studies, Southerners articulate slower, and Americans from the Upper Midwest more quickly. In addition, there is a small but positive correlation between population size and articulation rate. Articulation rates are mapped using a measure of local autocorrelation.

Full Text