Abstract

The paper discusses the benefits and shortcomings of modelling a language change with logistic regression, an approach often called the Piotrowski-Altmann law. It is shown with an example of an isolated change, which occurred in Middle Polish, namely barzo > bardzo. The study is based on a historical corpus of Polish consisting of several hundreds of texts with over 12 million running words. Logistic regression based on the entire dataset shows relatively high goodness of fit, still there are some data points, especially close to the end of the process, which are quite far removed from the idealised trajectory. In the article, the author seeks to answer the question: to what extent the quality of the corpus affects the model. An experiment was conducted: a number of texts were randomly removed in order to create a smaller corpus, containing 90%, 75% and 50% of the texts of the entire set. Since such procedure is repeated 200 times, it is possible to compare the distribution of the scores indicating the goodness of fit of the model. It turns out that the smaller the corpus, the more diverse the goodness of fit, and in some rare cases it is even better than its counterpart for a larger corpus. Still the larger the corpus, the scores indicating goodness of fit tend to be higher.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.