Productivity and Predictability for Measuring Morphological Complexity.

Ximena Gutierrez-Vasques,Victor Mijangos

doi:10.3390/e22010048

Ximena Gutierrez-Vasques, Victor Mijangos

Open Access

https://doi.org/10.3390/e22010048

Copy DOI

Abstract

We propose a quantitative approach for quantifying morphological complexity of a language based on text. Several corpus-based methods have focused on measuring the different word forms that a language can produce. We take into account not only the productivity of morphological processes but also the predictability of those morphological processes. We use a language model that predicts the probability of sub-word sequences within a word; we calculate the entropy rate of this model and use it as a measure of predictability of the internal structure of words. Our results show that it is important to integrate these two dimensions when measuring morphological complexity, since languages can be complex under one measure but simpler under another one. We calculated the complexity measures in two different parallel corpora for a typologically diverse set of languages. Our approach is corpus-based and it does not require the use of linguistic annotated data.

Highlights

Languages of the world differ from each other in unpredictable ways [1,2]
We use the notations H1, H3 for the entropy rate calculated with unigrams and trigrams respectively; TTR is the type-token relationship
To combine the different complexity dimensions, we ranked the languages according to each measure, we averaged the obtained ranks for each language (since we ranked the languages from the most complex to the less complex, we used the inverse of the average in order to be consistent with the complexity measures (0 for least complex, 1 for the most complex))

Summary

Introduction

Languages of the world differ from each other in unpredictable ways [1,2]. Language complexity focuses on determine how these variations occurs in terms of complexity (size of grammar elements, internal structure of the grammar).Conceptualizing and quantifying linguistic complexity is not an easy task, many quantitative and qualitative dimensions must be taken into account [3]. Language complexity focuses on determine how these variations occurs in terms of complexity (size of grammar elements, internal structure of the grammar). Several corpus-based methods are successful in capturing the number and variety of the morphological elements of a language by measuring the distribution of words over a corpus. They may not capture other complexity dimensions such as the predictability of the internal structure of words. There can be cases where a language is considered complex because it has a rich morphological productivity, i.e., great number of morphs can be encoded into a single word. The combinatorial structure of these morphs in the word formation process can have less uncertainty than other languages, i.e., more predictable

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Dec 30, 2019
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Productivity and Predictability for Measuring Morphological Complexity.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Morpheme level feature-based language models for German LVCSR
Amr El-Desoky Mousa ... Ralf Schlüter
-
Amr El-Desoky Mousa, et. al.Amr El-Desoky Mousa ... Ralf Schlüter
09 Sep 2012
09 Sep 2012

Dynamic language modeling for broadcast news
Langzhou Chen ... Lori Lamel
-
Langzhou Chen, et. al.Langzhou Chen ... Lori Lamel
04 Oct 2004
04 Oct 2004

Constructing n-gram rules for natural language models through exploring the limitation of the Zipf–Mandelbrot law
Harry M Chang
Computing | VOL. 91
Harry M ChangHarry M Chang
02 Oct 2010
Computing | VOL. 91

Editorial: What Have Large-Language Models and Generative Al Got to Do With Artificial Life?
Alan Dorin ... Susan Stepney
Artificial life | VOL. 29
Alan Dorin, et. al.Alan Dorin ... Susan Stepney
01 May 2023
Artificial life | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Productivity and Predictability for Measuring Morphological Complexity.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy