Abstract
Abstract This paper presents a method which we are developing to explore graphemic variation in large historical corpora of German. Historical corpora provide an amount of data at the level of graphemics which cannot be handled exhaustively using common methods of manual evaluation. To deal with this challenge, we apply methods from computational linguistics to pave the way for a broad-coverage graph(em)ic analysis of large historical corpora. In this paper, we show how our approach can be applied to the Reference Corpus of Middle High German. Illustrating our method and linguistic analysis, we present findings from our investigations into diatopic and/or diachronic variation as documented in 13th and 14th century charters (Urkunden) from the corpus.
Highlights
The methods we present in this paper answer the call for semi-automatic means to analyze graphemic variation in historical texts
The graphemic level provides data sets that consist, on a basic level, of nothing else than character strings, which can be processed automatically. We use this fact to our advantage: The computational linguistic methods that we use are based on methods developed for normalizing historical spellings, i. e., for automatically mapping a historical spelling variant to a standardized form
On the level of historical graphemics, our goal is to map out the above-mentioned continuum of different ‘levels’ of variation in detail
Summary
The methods we present in this paper answer the call for semi-automatic means to analyze graphemic variation in historical texts (cf. Elmentaler 2018: 335). Word-initial ko- from variety 1 might correspond to cho- in the other variety (as in chomen vs komen ‘come’) These mappings form the basis for our graphemic investigations. In Dipper and Waldenberger (2017), we applied the described methodology for the first time and examined mappings that were derived from a parallel corpus containing texts of different dialects from Early New High German, with large overlaps in vocabulary. The results of this pilot study were promising in that relevant variants could be automatically identified.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.