Drawing areal information from a corpus of noisy dialect data

Alfred Lameli,Philipp Stöckle,Elvira Glaser

doi:10.1017/jlg.2020.4

Alfred Lameli, Philipp Stöckle + Show 1 more

Open Access

https://doi.org/10.1017/jlg.2020.4

Copy DOI

Abstract

AbstractThis article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.

Full Text