A paradoxical property of the monkey book

Sebastian Bernhardsson,Seung Ki Baek,Petter Minnhagen

doi:10.1088/1742-5468/2011/07/p07013

Abstract

A ‘monkey book’ is a book consisting of a random sequence of letters and blanks, where agroup of letters surrounded by two blanks is defined as a word. We compare the statistics ofthe word distribution for a monkey book to real books. It is shown that the worddistribution statistics for the monkey book is different and quite distinct from a typical realbook. In particular, the monkey book obeys Heaps’ power law to an extraordinarily goodapproximation, in contrast to the word distributions for real books, which deviate fromHeaps’ law in a characteristic way. This discrepancy is traced to the different properties ofa ‘spiked’ distribution and its smooth envelope. The somewhat counter-intuitiveconclusion is that a ‘monkey book’ obeys Heaps’ power law precisely because itsword-frequency distribution is not a smooth power law, contrary to the expectationbased on simple mathematical arguments that if one is a power law, so is theother.

Full Text