The Menzerath-Altmann Law Interpreted through Analysis of Word Structure in Tatar
This study confirms the Menzerath-Altmann law in Tatar, a Turkic language, by analyzing ten texts and finding high model fit (R2 from 0.676 to 0.999). It shows that syllable length depends on position and word length, with affixation decreasing syllable size, and the law holds for both token and type data, especially for types.
Validity of the Menzerath-Altmann law has been confirmed in a number of works for lan-guages with different morphology. This research is aimed at disclosure of potential correla-tions between the structure of the Tatar word form and the average length of its constituent syllables; taking into consideration that the Turkic language family is rarely represented in quantitative linguistics, we tested the law using data from ten Tatar texts, both poetry and prose, to interpret the results from the point of view of grammar. To assess the goodness of fit of the model we applied the coefficient of determination R2 which for different texts ranged from 0.676 to 0.999. The study revealed that the examined data in Tatar generally abide by G. Altmann’s formula; the average syllable length depended both on its position in the word and on word length, and joining more affixes provided decrease in the average syl-lable length. For individual texts, while the law as a trend was observed, we discovered cer-tain fluctuations when the average syllable length for sufficiently long words was greater than for relatively short ones. Analyzing the whole corpus data we distinguished between tokens (with frequencies taken into account) and types (unique word forms considered once); the research proved the Menzerath-Altmann law valid in both cases, with better result for types.
- Research Article
16
- 10.1076/jqul.8.1.1.4091
- Apr 1, 2001
- Journal of Quantitative Linguistics
Continuing Best (1998), this paper presents new investigations of the Göttingen Project on Quantitative Linguistics which aims at the examination of the laws controlling the frequency distributions of different kinds of linguistic units in texts and lexica. The main topic was the distributions of word lengths in texts; up to now, more than 40 languages have been investigated with promising results. In the mean time, some word length distributions in lexica are considered as well as the distributions of many other entities in texts. New results concerning the distributions of parts of speech suggest a more general validity of the law, which in the very beginning was intended for word length distributions only. For the time being, there exist very few test results which do not support it. The law of probability distributions concerning classes of entities can be seen as a kind of ‘horizontal’ language structuring beside others like the distributions of single entities (graphemes, phonemes, word forms, etc.), which follow several empirical distributions (Zipf-Mandelbrot, Geometric and Hypergeometric Distributions), and a ‘vertical’ one by the Menzerath-Altmann law. Together with the Köhlerian circle, a multiple structuring of language and texts has to be conceived of.
- Research Article
- 10.26907/2541-7738.2021.1.180-189
- Jan 1, 2021
- Uchenye Zapiski Kazanskogo Universiteta. Seriya Gumanitarnye Nauki
The Menzerath–Altmann law on the relationship between the length of linguistic units and the length of their components is one of the important laws of quantitative linguistics. This law is a result of an advanced linguistic structures organization and is of great importance for the modern theory of language aimed at revealing the relations between qualitative features and quantitative parameters of the language. The validity of the Menzerath–Altmann law has been confirmed in a number of works on languages with different morphological structures. The main purpose of this paper is empirical testing of the Menzerath–Altmann law on the Tatar language with the help of various fiction texts (both poetry and prose). The distribution of word forms in the Tatar language by length, observed values of the average syllable length depending on the word length, average values of the syllable length predicted by the model, as well as the model parameters were investigated for the analyzed texts. To assess the goodness of fitting of the model, the coefficient of determination R2, which for different texts ranged from 0.676 to 0.999, was used. It was concluded that G. Altman’s formula is in good agreement with the data of the Tatar language. The model predicts not only the decreasing average syllable length with the increasing word length (function monotonicity), but also its subsequent increasing (change in the function monotonicity) for a number of texts.
- Research Article
- 10.2478/jazcas-2025-0032
- Jun 1, 2025
- Journal of Linguistics/Jazykovedný casopis
The study investigates the relationship between word length and phoneme sonority in six languages across diverse language families. Building on the principle of least effort and the Menzerath-Altmann law, the research is aimed to analyze the phoneme sonority using translated New Testament texts in Bilua, Bola, Czech, Gagauz, Jamamadi, and Tongan. The findings reveal that in languages with complex syllables, the tendency of longer words to contain shorter syllables—consistent with the Menzerath-Altmann law— results in a higher proportion of vowels, thereby increasing the mean phoneme sonority. In contrast, languages with simple syllable structures exhibit either a decrease in mean phoneme sonority or no clear trend. Further, mean consonant sonority increases with word length in Bilua, Czech, and Gagauz, while no clear trend is observed in Bola, Jamamadi, and Tongan. Conversely, mean vowel sonority increases with word length in Bola, Jamamadi, and Tongan, but remains stable or decreases in Bilua, Czech, and Gagauz. Overall, the analysis reveals consistent patterns linking word length and sonority across all six languages.
- Research Article
15
- 10.1016/j.biosystems.2011.11.010
- Dec 16, 2011
- Biosystems
Random models of Menzerath–Altmann law in genomes
- Research Article
5
- 10.1080/09296174.2022.2027657
- Jan 15, 2022
- Journal of Quantitative Linguistics
Notwithstanding theoretical simulations of distinctive cognitive processes and load of consecutive (CI) and simultaneous interpreting (SI), quantitative linguistic inquiry into their outputs is needed for solid empirical evidence. As a fundamental law of quantitative linguistics, Menzerath–Altmann Law (MAL) mirrors the economic processing of linguistic information and complex dynamic language system. Given its extensive validation at various linguistic levels and predictive power of its parameters in register, language and authorship differentiation, MAL is worthy of being applied to interpreting studies. We endeavour to investigate whether interpreted languages follow the MAL and reveal varied cognitive load of CI versus SI, as manifested by different MAL fitting models. Results show that (1) both CI and SI outputs follow the MAL; (2) SI processing involves more diversified structural information and shows a greater tendency of shortening the clauses of a sentence with increased sentence length, than CI processing, expressed by significantly higher a and lower b in SI models than that in CI models. Our findings suggest the disparate language representations are shaped by cognitive capacity limitations and interpreting modalities, and reveal how language system dynamically re-regulates and reorganizes the linguistic information to accommodate environmental settings from the perspective of synergetic linguistics.
- Book Chapter
1
- 10.51305/icl.cz.9788076580336.01
- Jan 1, 2022
The aim of the paper is to test the validity of the Menzerath-Altmann law for Czech poems from K. J. Erben’s ballad collection Kytice z pověstí národních (A Bouquet of Folk Legends). We focus particularly on the relationship between word length and syllable length. The Menzerath-Altmann law predicts that the mean syllable length will be longer in shorter words. The parameters of the mathematical model of this law for poems are compared with those for prose texts.
- Research Article
12
- 10.1093/llc/fqab110
- Jan 19, 2022
- Digital Scholarship in the Humanities
The length of language units, such as word length or sentence length, plays a critical role in register classification studies. However, in this line of work, little attention has been paid to the relationship between the lengths of language units at different levels. The Menzerath–Altmann law (MAL) reflects the functional relationship between the lengths of linguistic units at different levels, and its parameters were shown to be register-sensitive. This article focuses on two interrelated questions based on the MAL: (1) whether there are variations in the hierarchical relationships between language units at different levels and (2) whether such variations will influence register classifications. The results based on written Chinese show that (1) the MAL fittings at the ‘sentence > clause > word’ levels outperform that at the ‘clause > word > character’ levels and (2) the classifications based on two registers, i.e., Press (reportage) and Science (academic prose), demonstrate that the fitting parameters at the ‘sentence > clause > word’ levels also outperform those at the ‘clause > word > character’ levels. These indicate that the variations of hierarchical relationships between language units at different levels should be considered in register analysis. Further interpretations were given from perspectives of the information-theoretic principle and language evolution.
- Research Article
1
- 10.1080/09296174.2025.2545052
- Sep 27, 2025
- Journal of Quantitative Linguistics
The Menzerath-Altmann law predicts an inverse relationship between the lengths of a linguistic unit and of its parts. As a relationship between word length and the mean syllable length, it has been shown to be valid in many languages. However, we present several languages in which the mean syllable length does not decrease with increasing word length. These languages have simple syllables (mostly only of CV and V structure). This behaviour is explained as a consequence of the horror aequi principle, according to which language avoids similar units close to each other. The implications for the general validity of the Menzerath-Altmann law are discussed.
- Research Article
4
- 10.1080/09296174.2023.2259937
- Sep 27, 2023
- Journal of Quantitative Linguistics
According to the Menzerath-Altmann law, longer language constructs consist, on average, of shorter constituents. It is most often studied at the level of words and syllables (the mean syllable length gets shorter with the increasing word length). Its validity at this level was corroborated in several languages. However, it was claimed that Chinese is an exception with respect to the validity of the Menzerath-Altmann law. We show that the law is valid if word types are considered, while the behaviour of word tokens is different. This difference can be explained by the fact that the Zipf law of abbreviation is valid not only for words but also for syllables (shorter syllables are used more frequently).
- Research Article
11
- 10.1080/09296174.2018.1424493
- Jan 24, 2018
- Journal of Quantitative Linguistics
This paper discusses the Menzerath-Altmann law in general at first, then it is shown that the law is valid in spoken Czech. In particular, the relation between word length (measured in the number of syllables) and the mean syllable length (measured in the number of phonemes) is investigated. In addition, we model the relation between the relative occurrence of prothetic /v/ in words and word stems which, according to the official norms of the Czech language, begin with phoneme /o/, and word length in syllables in these words.
- Research Article
6
- 10.1016/j.langsci.2023.101554
- May 12, 2023
- Language Sciences
How does language evolve as a multi-level system? A quantitative exploration of written Chinese
- Research Article
7
- 10.1515/lingvan-2022-0048
- Sep 26, 2022
- Linguistics Vanguard
Menzerath-Altmann law (MAL) describes the relationship between the size of the construct and of its constituents, where the larger the whole, the smaller its parts. Despite numerous investigations dedicated to MAL, few studies have observed the relationship syntactically, especially at the clause level. The present study investigates three units in which clauses in English can be measured, i.e., argument, phrase, and word, by fitting MAL to the relationship between the size of the clause and its constituents. Results show that 1) clause length in phrases can be well fitted by probability distributions, while the goodness-of-fit is less favorable for clause length in arguments and words. 2) MAL holds reasonably well between the size of the clause in phrases and of the phrase in words under some conditions, i.e., within a specific range of construct size and text genres. 3) To summarize, the phrase, a notion proposed by Mačutek, Ján, Radek Čech & Jiří Milička. 2017. Menzerath-Altmann law in syntactic dependency structure. In Proceedings of the fourth international conference on dependency linguistics (Depling 2017), 100–107 under the theoretical framework of dependency grammar, is the most appropriate one to be the neighboring unit of the clause among the three measurement units. These findings may shed light on the features of syntactic structures and lead to a better understanding of the human language system.
- Research Article
12
- 10.1080/09296174.2016.1142328
- Apr 2, 2016
- Journal of Quantitative Linguistics
In this paper, we experimentally study the degree to which the length of a short text affects its comprehensiveness and readability, within quantitative linguistics. The quantitative linguistics focus mainly in analysis of large text collections and one of the major scientific theories in use is the Menzerath-Altmann law. In this paper we attempt to define the quantitative analysis framework for short texts consisting approximately of one or two sentences, due to the fact that they are considered very important in many scientific fields. To achieve the aim of this paper, a coherence statistical testing process of three variables was created for short texts. The implementation of that was possible through experimental and statistical evaluation. Upon completion of the above-mentioned evaluation, the statistical results showed that short text coherence, comprehensiveness and readability are fully achieved in short texts consisting of 14 words, when three predetermined variables are associated and vice versa. To prove the above hypothesis the theory of Vector Space Model and Kendall’s Coefficient of Concordance were used. The assessment of statistical results concluded that the above hypothesis can be fully met for a number of cases with a probability p > 99%. Moreover, in the experiment were used short texts in English language but it was proven that language can be considered irrelevant. To corroborate this, a smaller scale experiment with short texts in the German language was conducted and hypothesis was confirmed that the proposed model of this paper can be applied in all short texts regardless of their linguistic origin.
- Research Article
2
- 10.15587/1729-4061.2021.238743
- Oct 31, 2021
- Eastern-European Journal of Enterprise Technologies
This research is aimed at identifying the parts of speech for the Kazakh and Turkish languages in an information retrieval system. The proposed algorithms are based on machine learning techniques. In this paper, we consider the binary classification of words according to parts of speech. We decided to take the most popular machine learning algorithms. In this paper, the following approaches and well-known machine learning algorithms are studied and considered. We defined 7 dictionaries and tagged 135 million words in Kazakh and 9 dictionaries and 50 million words in the Turkish language. The main problem considered in the paper is to create algorithms for the execution of dictionaries of the so-called Link Grammar Parser (LGP) system, in particular for the Kazakh and Turkish languages, using machine learning techniques. The focus of the research is on the review and comparison of machine learning algorithms and methods that have accomplished results on various natural language processing tasks such as grammatical categories determination. For the operation of the LGP system, a dictionary is created in which a connector for each word is indicated – the type of connection that can be created using this word. The authors considered methods of filling in LGP dictionaries using machine learning. The complexities of natural language processing, however, do not exclude the possibility of identifying narrower tasks that can already be solved algorithmically: for example, determining parts of speech or splitting texts into logical groups. However, some features of natural languages significantly reduce the effectiveness of these solutions. Thus, taking into account all word forms for each word in the Kazakh and Turkish languages increases the complexity of text processing by an order of magnitude
- Research Article
3
- 10.2478/jazcas-2021-0037
- Dec 1, 2021
- Journal of Linguistics/Jazykovedný casopis
It is shown that the mean morpheme length (measured in phonemes) decreases with the increasing length of word types (in morphemes) in Czech texts, i.e., these language units behave according to the Menzerath-Altmann law. The law is not valid in general for word tokens. Some hints towards an interpretation of parameters are presented.