Abstract

Zipf's law states that if words of a language are sorted in the order of decreasing frequency of usage, a word's frequency is inversely proportional to its rank, or sequence number in the list. The Zipf-Mandelbrot law is a more general formula that provides a better fit in the low-rank region. Among several models aimed at explaining this effect, Mandelbrot's model is one of the best known. It derives Zipf's law as a result of the optimization of information/cost ratio, but leads to an unrealistic view of texts as random character sequences. In this article, a new modification of the model is proposed that is free from this drawback and allows the optimal information/cost ratio to be achieved via language evolution. It is demonstrated that the Zipf-Mandelbrot formula follows from this model, but its two parameters are not independent. As a result, the formula cannot convincingly be fitted to the actual word frequency distributions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call