Abstract
In this article we propose a novel method to estimate the frequency distribution of linguistic variables while controlling for statistical non-independence due to shared ancestry. Unlike previous approaches, our technique uses all available data, from language families large and small as well as from isolates, while controlling for different degrees of relatedness on a continuous scale estimated from the data. Our approach involves three steps: First, distributions of phylogenies are inferred from lexical data. Second, these phylogenies are used as part of a statistical model to estimate transition rates between parameter states. Finally, the long-term equilibrium of the resulting Markov process is computed. As a case study, we investigate a series of potential word-order correlations across the languages of the world.
Highlights
One of the central research topics of linguistic typology concerns the distribution of structural properties across the languages of the world
The model will be exemplified with a study of the potential correlations between eight word-order features from the World Atlas of Language Structure (Dryer and Haspelmath, 2013) that were used by Dunn et al (2011)
We identified a total of 1,626 of languages for which World Atlas of Language Structures (WALS) contains information about at least one word-order feature and the data from Jäger (2018) contain characters
Summary
One of the central research topics of linguistic typology concerns the distribution of structural properties across the languages of the world. More recent work often uses more sophisticated techniques such as repeated stratified random sampling (e.g., Blasi et al, 2016) Another approach currently gaining traction is the usage of (generalized) mixed-effects models (Breslow and Clayton, 1993), where genealogical units such as families or genera, as well as linguistic areas, are random effects see, e.g., Atkinson (2011), Bentz and Winter (2013), and Jaeger et al (2011) for applications to typology. If it is possible to estimate the diachronic transition probabilities, and if one assumes that language change has the Markov property (i.e., is memoryless), one can compute the long-term equilibrium probability of this Markov process This equilibrium distribution should be used as the basis to identify linguistically meaningful distributional universals. The model will be exemplified with a study of the potential correlations between eight word-order features from the World Atlas of Language Structure (Dryer and Haspelmath, 2013) that were used by Dunn et al (2011)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.