Extension of Zipf's law to words and phrases

Le Quan Ha,E I Sicilia-Garcia,Ji Ming,F J Smith

doi:10.3115/1072228.1072345

Extension of Zipf's law to words and phrases

Le Quan Ha, E I Sicilia-Garcia + Show 2 more

Open Access

https://doi.org/10.3115/1072228.1072345

Copy DOI

Publication Date: Jan 1, 2002

Citations: 102

Affiliation: Queen's University Belfast

#Zipf's Law #N-gram Phrases + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for n-gram word phrases as well as for single words. The law for single words is shown to be valid only for high frequency words. However, when single word and n-gram phrases are combined together in one list and put in order of frequency the combined list follows Zipf's law accurately for all words and phrases, down to the lowest frequencies in both languages. The Zipf curves for the two languages are then almost identical.

Full Text