Data for: The unstoppable glottal: Tracking rapid change in an iconic British variable

Jennifer Smith ,Holmes-Elliott

doi:10.5064/f6ope7mj

Abstract

This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the publisher's website. We have concentrated on a number of key stages of sociolinguistic research, with specific reference to collection, processing and statistical analysis of data. Recordings of speech data: we are linguists working on speech data, yet we rely on written data to convey the core materials we work with. We thus include examples of actual speech recordings to provide concrete support for our claim that the data we are working with diverges significantly from mainstream norms. Data preparation and coding Transcription – example of protocol in action: the transcription of speech data must satisfy two, often competing, criteria: it has to be 1) an accurate reflection of what was actually said and 2) transparent and accessible for analysis. How this is achieved is no easy feat, thus we include the full transcription protocol here in order to highlight the complexities in representing speech data in written format: what changes, what does not, and why. Coding and annotation – from sound file to transcript to coded data: this phase of the research is often relegated to one or two lines in a journal article. This is highlighted by our own paper which states that ‘we extracted approx. 100 tokens per speaker per insider/outsider interview’. In this annotation we show how this is actually done, demonstrating how we isolate the linguistic variable in the original text to sound-aligned transcribed data, and how this annotation prepares for eventual extraction of the variable context under analysis. Coding schema: the coding schema arises from two different sources: 1) what has been found in previous research; 2) observation of the current data. As such, there are multiple possibilities for what governs the observed variability. The initial coding schema sets out to test these multiple possibilities. Occam’s Razor is then applied to these multiple categories in sifting the data for the best fit, resulting in a leaner, more interpretable coda schema as presented in the final article. We have included in this annotation the original more elaborated categories to highlight the behind the scenes work that takes place in making sense of the data. We also include sound files of the actual variants used. This allows the user to hear the different environments set out in the final coding schema as used in the object of study: spontaneous speech data. Statistical analysis – the program used: a challenge of statistical analysis is that field constantly evolves. This annotation is a case in point where the version of the program we used is now deprecated and no longer supported. The new version is more than a superficial change to the graphical interface and represents a completely different approach in the way the models are built (stepping-up based on p-values as opposed to stepping-down from fully saturated models). The wider implication is that this can mean that analyses are not fully replicable, particularly as the software becomes obsolete, thus we provide further information on the program used to highlight this potential problem. Statistical analysis – procedure: the description of the statistical analysis which appears in the final journal article is usually a ‘final model’ outlined in a linear fashion but the reality is a model that results from many different iterations where many different models are run and cross-referenced. The final model is a pay off between accuracy and elegance; we are aiming for the ‘best-fit’ but also the simplest or most straightforward computation. As we outline, in this case we decided to model each generation separately as this provided a clearer route to answer our research questions. However, other analysts may argue that a fully saturated model which represents all the interactions together is more accurate. Including this annotation provides further rationale for the model(s) we eventually used in the article.

Full Text