Data-Driven Syllabification for Middle Dutch

Wouter Haverals,Mike Kestemont,Folgert Karsdorp

doi:10.16995/dm.83

Abstract

The task of automatically separating Middle Dutch words into syllables is a challenging one. A first method was presented by Bouma and Hermans (2012), who combined a rule-based finite-state component with data-driven error correction. Achieving an average word accuracy of 96.5%, their system surely is a satisfactory one, although it leaves room for improvement. Generally speaking, rule-based methods are less attractive for dealing with a medieval language like Middle Dutch, where not only each dialect has its own spelling preferences, but where there is also much idiosyncratic variation among scribes. This paper presents a different method for the task of automatically syllabifying Middle Dutch words, which does not rely on a set of pre-defined linguistic information. Using a Recurrent Neural Network (RNN) with Long-Short-Term Memory cells (LSTM), we obtain a system which outperforms the rule-based method both in robustness and in effort.

Highlights

The Middle Dutch word for “damsel” has the following – and more – spelling variants: joncfrouwe, joncvrauwe, joncvrouwe, joncvrovwe, jonvrowe, ioncfrouwe, ionfrouwe, ioffrouwe, etc. (the extensive orthographic in Middle Dutch is the subject of a paper by Van Halteren and Rem (2013), who noted that the lemma gelijk (“”) has 24 different word forms in the Corpus Van Reenen-Mulder)
We propose a syllabification method that takes a preannotated list of syllabified Middle Dutch words as input for an Recurrent Neural Network (RNN)-tagger
Partie (“part”), for instance, a word frequently used by Maerlant in his rhymed chronicle Spiegel Historiael most likely had to be pronounced as /par.ti.jə/. This can be deduced from rhyme pairs such as partie : lije, where the grapheme 〈j〉 has the sound value of a consonant and marks a syllable boundary

Summary

Introduction

It goes without saying that the best way to go about this task would be through a simple look-up query in a dictionary, where words are stored alongside their syllabified versions. This method, is unattainable for Middle Dutch because of mainly two reasons: 1. Since there is no list available with all the different spelling variants of every Middle Dutch word, and since the existing dictionary does not contain syllabified versions of lemmas, one would like an automatic system that is able to correctly determine syllable boundaries, while dealing with this multitude of spelling variation in a flexible way. We propose a syllabification method that takes a preannotated list of syllabified Middle Dutch words as input for an RNN-tagger

Rules for syllabification of Modern Dutch

Syllabification of Middle Dutch

Previous research by Bouma and Hermans

Experiment and results

Data set

Results

Model inspection

Model criticism

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Digital Medievalist	Publication Date: Nov 4, 2019
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Data-Driven Syllabification for Middle Dutch

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Digital Medievalist

Lead the way for us

Similar Papers

Seventeen Words of Middle Dutch Origin in the Miller's Tale?
F M Biggs
Notes and Queries | VOL. 53
F M BiggsF M Biggs
01 Dec 2006
Notes and Queries | VOL. 53

The Mélusine Romance in Medieval Europe: Translation, Circulation, and Material Contexts by Lydia Zeldenrust
Tania M Colwell
Arthuriana | VOL. 31
Tania M ColwellTania M Colwell
01 Jan 2020
Arthuriana | VOL. 31

Investigating Deep Recurrent Connections and Recurrent Memory Cells Using Neuro-Evolution
Travis Desell ... Abdelrahman A Elsaid
-
Travis Desell, et. al.Travis Desell ... Abdelrahman A Elsaid
01 Jan 2020
01 Jan 2020

An Empirical Exploration of Deep Recurrent Connections Using Neuro-Evolution
Travis Desell ... Abdelrahman Elsaid
-
Travis Desell, et. al.Travis Desell ... Abdelrahman Elsaid
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data-Driven Syllabification for Middle Dutch

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Digital Medievalist