Abstract

AbstractWhat does the frequency of occurrence of different words in an article have to do with the number of times an article is cited? Or, for that matter, with the number of publications an author has? All of these—word frequency, citation frequency, and publication frequency‐obey an ubiquitous distribution called Zipf's law. Zipf's law applies as well to such diverse subjects as income distribution, firm size, and biological genera and species. Zipf in 1949 described a hyperbolic rank‐frequency word distribution, which he fitted to a number of texts. He stated that if all unique words in a text are arranged (or ranked) in order of decreasing frequency of occurrence, the product of frequency times rank yields a constant which is approximately equal for all words in a text. The law has been shown to encompass many natural phenomena, and is equivalent to the distributions of Yule, Lotka, Pareto, Bradford, and Price. An ubiquitous empirical regularity suggests some universal principal. This article examines a number of theoretical derivations of the law, in order to show the relationship between the many attempts at ascertaining a theoretical justification for the phenomenon. We then briefly examine some of the ramifications of applying the law to the bibliographic database environment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call