Abstract

ABSTRACTThis paper addresses the distribution features of word length and stem length in Mongolian, employing both dynamic (a corpus of 1 million Mongolian word tokens) and static (an orthographic Mongolian dictionary and a Mongolian stem dictionary) language resources. The results show that the Mongolian words and stems abide by the Poisson distribution. Concretely, the word length from the dynamic corpus abide by the Dacey-Poisson distribution, and all the others abide by the Conway-Maxwell-Poisson distribution. In addition, the Mongolian word lengths are influenced by word frequencies, basically abiding by Zipf’s Principle of Least Effort. The fitting experiments of power functions relationship between Mongolian word lengths and word frequencies using individual short texts, continuous long texts, and fixed-length texts indicate that the individual texts with fixed length (about 2000 words) yield the best fitting results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.