Abstract

The task of automatically separating Middle Dutch words into syllables is a challenging one. A first method was presented by Bouma and Hermans (2012), who combined a rule-based finite-state component with data-driven error correction. Achieving an average word accuracy of 96.5%, their system surely is a satisfactory one, although it leaves room for improvement. Generally speaking, rule-based methods are less attractive for dealing with a medieval language like Middle Dutch, where not only each dialect has its own spelling preferences, but where there is also much idiosyncratic variation among scribes. This paper presents a different method for the task of automatically syllabifying Middle Dutch words, which does not rely on a set of pre-defined linguistic information. Using a Recurrent Neural Network (RNN) with Long-Short-Term Memory cells (LSTM), we obtain a system which outperforms the rule-based method both in robustness and in effort.

Highlights

  • The Middle Dutch word for “damsel” has the following – and more – spelling variants: joncfrouwe, joncvrauwe, joncvrouwe, joncvrovwe, jonvrowe, ioncfrouwe, ionfrouwe, ioffrouwe, etc. (the extensive orthographic in Middle Dutch is the subject of a paper by Van Halteren and Rem (2013), who noted that the lemma gelijk (“”) has 24 different word forms in the Corpus Van Reenen-Mulder)

  • We propose a syllabification method that takes a preannotated list of syllabified Middle Dutch words as input for an Recurrent Neural Network (RNN)-tagger

  • Partie (“part”), for instance, a word frequently used by Maerlant in his rhymed chronicle Spiegel Historiael most likely had to be pronounced as /par.ti.jə/. This can be deduced from rhyme pairs such as partie : lije, where the grapheme 〈j〉 has the sound value of a consonant and marks a syllable boundary

Read more

Summary

Introduction

It goes without saying that the best way to go about this task would be through a simple look-up query in a dictionary, where words are stored alongside their syllabified versions. This method, is unattainable for Middle Dutch because of mainly two reasons: 1. Since there is no list available with all the different spelling variants of every Middle Dutch word, and since the existing dictionary does not contain syllabified versions of lemmas, one would like an automatic system that is able to correctly determine syllable boundaries, while dealing with this multitude of spelling variation in a flexible way. We propose a syllabification method that takes a preannotated list of syllabified Middle Dutch words as input for an RNN-tagger

Rules for syllabification of Modern Dutch
Syllabification of Middle Dutch
Previous research by Bouma and Hermans
Experiment and results
Data set
Results
Model inspection
Model criticism
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call