Abstract
Language change is often assessed against a set of pre-determined time periods in order to be able to trace its diachronic trajectory. This is problematic, since a pre-determined periodization might obscure significant developments and lead to false assumptions about the data. Moreover, these time periods can be based on factors which are either arbitrary or non-linguistic, e.g., dividing the corpus data into equidistant stages or taking into account language-external events. Addressing this problem, in this paper we present a data-driven approach to periodization: ‘DiaHClust’. DiaHClust is based on iterative hierarchical clustering and offers a multi-layered perspective on change from text-level to broader time periods. We demonstrate the usefulness of DiaHClust via a case study investigating syntactic change in Icelandic, modelling the syntactic system of the language in terms of vectors of syntactic change.
Highlights
In historical linguistics, it is generally acknowledged that language change proceeds gradually rather than abruptly (e.g., Kroch, 2001)
With DiaHClust, we show that a data-driven periodization methodology can be applied to a language like Icelandic, where syntactic change is not as extreme as Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 126–135 Florence, Italy, August 2, 2019. c 2019 Association for Computational Linguistics in other Germanic languages, and where the available annotated corpus data is relatively sparse
We have developed DiaHClust for a study of syntactic change in Icelandic based on data from the Icelandic Parsed Historical Corpus (‘IcePaHC’, Wallenberg et al, 2011)
Summary
It is generally acknowledged that language change proceeds gradually rather than abruptly (e.g., Kroch, 2001). The problematic nature of this methodology is well known, though rarely made explicit (see, e.g., Curzan, 2012) Such an approach may yield results which conceal the true trajectory of a phenomenon. With the boom in corpus-based and computational studies of language change over recent decades, the periodization problem has been readdressed, as new data-driven methodologies have emerged, in relation to historical English (see, e.g., Gries and Hilpert, 2008, 2012; Degaetano-Ortlieb and Teich, 2018). The periodization scheme can be arrived at via a range of statistical methods, e.g., hierarchical clustering and relative entropy This yields objective data-driven periodization schemes which are faithful to the corpus data and can still be used to arrive at meaningful generalizations. With DiaHClust, we show that a data-driven periodization methodology can be applied to a language like Icelandic, where syntactic change is not as extreme as Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 126–135 Florence, Italy, August 2, 2019. c 2019 Association for Computational Linguistics in other Germanic languages, and where the available annotated corpus data is relatively sparse
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have