Abstract

In this article we propose a novel method to estimate the frequency distribution of linguistic variables while controlling for statistical non-independence due to shared ancestry. Unlike previous approaches, our technique uses all available data, from language families large and small as well as from isolates, while controlling for different degrees of relatedness on a continuous scale estimated from the data. Our approach involves three steps: First, distributions of phylogenies are inferred from lexical data. Second, these phylogenies are used as part of a statistical model to estimate transition rates between parameter states. Finally, the long-term equilibrium of the resulting Markov process is computed. As a case study, we investigate a series of potential word-order correlations across the languages of the world.

Highlights

  • One of the central research topics of linguistic typology concerns the distribution of structural properties across the languages of the world

  • The model will be exemplified with a study of the potential correlations between eight word-order features from the World Atlas of Language Structure (Dryer and Haspelmath, 2013) that were used by Dunn et al (2011)

  • We identified a total of 1,626 of languages for which World Atlas of Language Structures (WALS) contains information about at least one word-order feature and the data from Jäger (2018) contain characters

Read more

Summary

INTRODUCTION

One of the central research topics of linguistic typology concerns the distribution of structural properties across the languages of the world. More recent work often uses more sophisticated techniques such as repeated stratified random sampling (e.g., Blasi et al, 2016) Another approach currently gaining traction is the usage of (generalized) mixed-effects models (Breslow and Clayton, 1993), where genealogical units such as families or genera, as well as linguistic areas, are random effects see, e.g., Atkinson (2011), Bentz and Winter (2013), and Jaeger et al (2011) for applications to typology. If it is possible to estimate the diachronic transition probabilities, and if one assumes that language change has the Markov property (i.e., is memoryless), one can compute the long-term equilibrium probability of this Markov process This equilibrium distribution should be used as the basis to identify linguistically meaningful distributional universals. The model will be exemplified with a study of the potential correlations between eight word-order features from the World Atlas of Language Structure (Dryer and Haspelmath, 2013) that were used by Dunn et al (2011)

Continuous Time Markov Processes
Phylogenetic Markov Chains
Word Order Features
Obtaining Language Phylogenies
Generative Models
Prior Predictive Sampling
Model Fitting
Posterior Predictive Sampling
Bayesian Model Comparison
2.10. Feature Correlations
DISCUSSION
Word-Order Correlations
Findings
CONCLUSION
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.