Abstract

Scaling laws characterize diverse complex systems in a broad range of fields, including physics, biology, finance, and social science. The human language is another example of a complex system of words organization. Studies on written texts have shown that scaling laws characterize the occurrence frequency of words, words rank, and the growth of distinct words with increasing text length. However, these studies have mainly concentrated on the western linguistic systems, and the laws that govern the lexical organization, structure and dynamics of the Chinese language remain not well understood. Here we study a database of Chinese and English language books. We report that three distinct scaling laws characterize words organization in the Chinese language. We find that these scaling laws have different exponents and crossover behaviors compared to English texts, indicating different words organization and dynamics of words in the process of text growth. We propose a stochastic feedback model of words organization and text growth, which successfully accounts for the empirically observed scaling laws with their corresponding scaling exponents and characteristic crossover regimes. Further, by varying key model parameters, we reproduce differences in the organization and scaling laws of words between the Chinese and English language. We also identify functional relationships between model parameters and the empirically observed scaling exponents, thus providing new insights into the words organization and growth dynamics in the Chinese and English language.

Highlights

  • Scaling laws have been discovered and investigated in many fields such as physics, biology, finance, geology, and sociology

  • We treat each Chinese character as a separate word because in contrast to western languages where each word is composed of letters, characters in the Chinese language do not correspond to letters but often indicate separate words, and the same Chinese character can play role as a verb, noun, or adverb depending on the context in the sentence

  • To understand the mechanism underlying words organization leads to the empirically observed scaling laws and study the difference between Chinese and English languages, we introduce a stochastic feedback model that accounts for the probability of word occurrence and growth of new word with increasing text length

Read more

Summary

Introduction

Scaling laws have been discovered and investigated in many fields such as physics, biology, finance, geology, and sociology. Our analyses show that while English texts exhibit a power-law with a single exponent α = 1.05 ± 0.03 for the entire fitting range of words frequency rank r 2 [3, 2 × 103], the Chinese language texts are characterized by a clear crossover in the Zipf’s scaling from regime with α1 = 0.60 ± 0.07 for high and intermediate frequency ranks r 2 [3, 100] to a second regime with doi:10.1371/journal.pone.0168971.g001

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call