Abstract

Abstract This paper presents a method which we are developing to explore graphemic variation in large historical corpora of German. Historical corpora provide an amount of data at the level of graphemics which cannot be handled exhaustively using common methods of manual evaluation. To deal with this challenge, we apply methods from computational linguistics to pave the way for a broad-coverage graph(em)ic analysis of large historical corpora. In this paper, we show how our approach can be applied to the Reference Corpus of Middle High German. Illustrating our method and linguistic analysis, we present findings from our investigations into diatopic and/or diachronic variation as documented in 13th and 14th century charters (Urkunden) from the corpus.

Highlights

  • The methods we present in this paper answer the call for semi-automatic means to analyze graphemic variation in historical texts

  • The graphemic level provides data sets that consist, on a basic level, of nothing else than character strings, which can be processed automatically. We use this fact to our advantage: The computational linguistic methods that we use are based on methods developed for normalizing historical spellings, i. e., for automatically mapping a historical spelling variant to a standardized form

  • On the level of historical graphemics, our goal is to map out the above-mentioned continuum of different ‘levels’ of variation in detail

Read more

Summary

Introduction

The methods we present in this paper answer the call for semi-automatic means to analyze graphemic variation in historical texts (cf. Elmentaler 2018: 335). Word-initial ko- from variety 1 might correspond to cho- in the other variety (as in chomen vs komen ‘come’) These mappings form the basis for our graphemic investigations. In Dipper and Waldenberger (2017), we applied the described methodology for the first time and examined mappings that were derived from a parallel corpus containing texts of different dialects from Early New High German, with large overlaps in vocabulary. The results of this pilot study were promising in that relevant variants could be automatically identified.

Generating difference profiles
Interpreting difference profiles
Text pairings reflecting diatopic variation
Text pairings reflecting diachronic variation
Statistically determined graphemic similarities
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.