Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes

Linyuan Lü,Zi-Ke Zhang,Tao Zhou

doi:10.1038/srep01082

Abstract

Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, at which the corresponding Zipf's exponent diverges. Indeed, the character frequency decays exponentially in the Zipf's plot. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, then turns to a logarithmical form, and eventually saturates. A theoretical model for writing process is proposed, which embodies the rich-get-richer mechanism and the effects of limited dictionary size. Experiments, simulations and analytical solutions agree well with each other. This work refines the understanding about Zipf's and Heaps' laws in human language systems.

Highlights

Zipf ’s law on word frequency and Heaps’ law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean
The character frequency decays exponentially in the Zipf ’s plot. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, turns to a logarithmical form, and eventually saturates
Via extensive analysis on Chinese, Japanese and Korean books, we found even more complicated phenomena: (i) The character frequency distribution follows a power law with exponent close to one, at which the corresponding Zipf ’s exponent diverges

Summary

Introduction

Zipf ’s law on word frequency and Heaps’ law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean These languages consist of characters, and are of very limited dictionary sizes. Luet al.[33] pointed out that in a growing system, if the appearing frequencies of elements obey the Zipf ’s law with a stable exponent, the number of distinct elements grows in a complicated way where the Heaps’ law is only an asymptotical approximation. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, turns to a logarithmical form, and eventually saturates All these unreported phenomena result from the combination of the rich-get-richer mechanism and the limited dictionary sizes, which is verified by a theoretical model

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jan 30, 2013
Citations: 57	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

A scaling law beyond Zipf's law and its relation to Heaps' law
Francesc Font-Clos ... Álvaro Corral
New Journal of Physics | VOL. 15
Francesc Font-Clos, et. al.Francesc Font-Clos ... Álvaro Corral
01 Sep 2013
New Journal of Physics | VOL. 15

Rank-Frequency Analysis of Characters in Garhwali Text: Emergence of Zipf's Law
...
Current Science | VOL. 110
, et. al. ...
01 Feb 2016
Current Science | VOL. 110

Zipf's Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems
Linyuan Lü ... Zi-Ke Zhang
PLoS ONE | VOL. 5
Linyuan Lü, et. al.Linyuan Lü ... Zi-Ke Zhang
02 Dec 2010
PLoS ONE | VOL. 5

Variation of Zipf's exponent in one hundred live languages: A study of the Holy Bible translations
Ali Mehri ... Maryam Jamaati
Physics Letters A | VOL. 381
Ali Mehri, et. al.Ali Mehri ... Maryam Jamaati
02 Jun 2017
Physics Letters A | VOL. 381

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports