Abstract

Language change is often assessed against a set of pre-determined time periods in order to be able to trace its diachronic trajectory. This is problematic, since a pre-determined periodization might obscure significant developments and lead to false assumptions about the data. Moreover, these time periods can be based on factors which are either arbitrary or non-linguistic, e.g., dividing the corpus data into equidistant stages or taking into account language-external events. Addressing this problem, in this paper we present a data-driven approach to periodization: ‘DiaHClust’. DiaHClust is based on iterative hierarchical clustering and offers a multi-layered perspective on change from text-level to broader time periods. We demonstrate the usefulness of DiaHClust via a case study investigating syntactic change in Icelandic, modelling the syntactic system of the language in terms of vectors of syntactic change.

Highlights

  • In historical linguistics, it is generally acknowledged that language change proceeds gradually rather than abruptly (e.g., Kroch, 2001)

  • With DiaHClust, we show that a data-driven periodization methodology can be applied to a language like Icelandic, where syntactic change is not as extreme as Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 126–135 Florence, Italy, August 2, 2019. c 2019 Association for Computational Linguistics in other Germanic languages, and where the available annotated corpus data is relatively sparse

  • We have developed DiaHClust for a study of syntactic change in Icelandic based on data from the Icelandic Parsed Historical Corpus (‘IcePaHC’, Wallenberg et al, 2011)

Read more

Summary

Introduction

It is generally acknowledged that language change proceeds gradually rather than abruptly (e.g., Kroch, 2001). The problematic nature of this methodology is well known, though rarely made explicit (see, e.g., Curzan, 2012) Such an approach may yield results which conceal the true trajectory of a phenomenon. With the boom in corpus-based and computational studies of language change over recent decades, the periodization problem has been readdressed, as new data-driven methodologies have emerged, in relation to historical English (see, e.g., Gries and Hilpert, 2008, 2012; Degaetano-Ortlieb and Teich, 2018). The periodization scheme can be arrived at via a range of statistical methods, e.g., hierarchical clustering and relative entropy This yields objective data-driven periodization schemes which are faithful to the corpus data and can still be used to arrive at meaningful generalizations. With DiaHClust, we show that a data-driven periodization methodology can be applied to a language like Icelandic, where syntactic change is not as extreme as Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 126–135 Florence, Italy, August 2, 2019. c 2019 Association for Computational Linguistics in other Germanic languages, and where the available annotated corpus data is relatively sparse

Data-driven approaches to periodization
Methodology
Vectors of syntactic change
Implementation of VNC
Cluster Validation for Cluster Identification
Iterative DiaHClust Approach
Case study: syntactic change in Icelandic
IcePaHC
Syntactic factors under investigation
Application of DiaHClust
Investigating syntactic change
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call