Abstract

A ‘monkey book’ is a book consisting of a random sequence of letters and blanks, where agroup of letters surrounded by two blanks is defined as a word. We compare the statistics ofthe word distribution for a monkey book to real books. It is shown that the worddistribution statistics for the monkey book is different and quite distinct from a typical realbook. In particular, the monkey book obeys Heaps’ power law to an extraordinarily goodapproximation, in contrast to the word distributions for real books, which deviate fromHeaps’ law in a characteristic way. This discrepancy is traced to the different properties ofa ‘spiked’ distribution and its smooth envelope. The somewhat counter-intuitiveconclusion is that a ‘monkey book’ obeys Heaps’ power law precisely because itsword-frequency distribution is not a smooth power law, contrary to the expectationbased on simple mathematical arguments that if one is a power law, so is theother.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call