Abstract

Abstract Belgian Dutch (BD) and Netherlandic Dutch (ND) are known to exhibit phonetic and lexical differences, but national variation in the syntax of Dutch has often been claimed to be quasi non-existent. This view is rooted in the fact that both laypersons and researchers are oblivious to national divergences in the grammar of Dutch (unless they are categorical and/or heavily mediatized), but also in the undisputed belief that BD and ND are different surface manifestations of ‘the same grammatical motor’. As a result, only a few syntactic phenomena have hitherto been shown to be sensitive to national constraints. In this paper we illustrate a computational bottom-up approach (pioneered in ) to cast the net as widely as possible. Building on statistical machine translation and a parallel corpus of Dutch translations of English subtitles, we identify plausible mappings between English n-grams and their Dutch translations. We do this in order to obtain paraphrases, i.e., stretches of interchangeable Dutch text that carry approximately the same meaning. In a first case study, we found corroborating evidence among the discovered paraphrases for many syntactic variables that have previously been attested in Dutch, including complementizer variation, existential er-variation, word order phenomena, and inflection variation. Crucially, we also discovered a number of alternations we had not anticipated as interesting variables. In order to detect national constraints on the newly found variables, we carried out a second experiment with a smaller corpus of Belgian and Netherlandic subtitles: the two variables we investigated in this light – deictic strength variation and subordination variation – did indeed manifest national sensitivity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call