A model for estimating the occurrence of same-frequency words and the boundary between high- and low-frequency words in texts

Qinglan Sun,Charles H Davis,Debora Shaw

doi:10.1002/(sici)1097-4571(1999)50:3<280::aid-asi11>3.0.co;2-h

Abstract

A simpler model is proposed for estimating the frequency of any same-frequency words and identifying the boundary point between high-frequency words and low-frequency words in a text. The model, based on a “maximum ranking method,” assigns ranks to the words and estimates word frequency by the formula: Int[(−1 + (1 + 4D/In+1)1/2)/2] > n* ≥ Int[(−1 + (1 + 4D/In)1/2)/2]. The boundary value between high-frequency and low-frequency words is obtained by taking the square root of the number of different words in the text: n* = (D)1/2. This straightforward model was used successfully with both English and Chinese texts, demonstrating that the frequency of words and the number of same-frequency words are dependent only on the vocabulary of a text (the number of different words) but not on its length. Like Zipf's Law, the model may be universally applicable.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A model for estimating the occurrence of same-frequency words and the boundary between high- and low-frequency words in texts

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Information Science. American Society for Information Science

Lead the way for us

Journal: Journal of the American Society for Information Science. American Society for Information Science	Publication Date: Jan 1, 1999
Citations: 8

Similar Papers

A model for estimating the occurrence of same‐frequency words and the boundary between high‐ and low‐frequency words in texts
Qinglan Sun ... Charles H Davis
Journal of the American Society for Information Science. American Society for Information Science | VOL. 50
Qinglan Sun, et. al.Qinglan Sun ... Charles H Davis
01 Jan 1998
Journal of the American Society for Information Science. American Society for Information Science | VOL. 50

Do children with Williams syndrome have unusual vocabularies?
Vesna Stojanovik ... Lizet Van Ewijk
Journal of neurolinguistics | VOL. 21
Vesna Stojanovik, et. al.Vesna Stojanovik ... Lizet Van Ewijk
10 Aug 2007
Journal of neurolinguistics | VOL. 21

Parametric effects of word frequency in memory for mixed frequency lists.
Lynn J Lohnas ... Michael J Kahana
Journal of Experimental Psychology: Human Learning & Memory | VOL. 39
Lynn J Lohnas, et. al.Lynn J Lohnas ... Michael J Kahana
01 Jan 2013
Journal of Experimental Psychology: Human Learning & Memory | VOL. 39

Frequency in Incidental Vocabulary Acquisition Research: An Undefined Concept and Some Consequences
Barry Lee Reynolds ... Constant Leung
TESOL Quarterly | VOL. 48
Barry Lee Reynolds, et. al.Barry Lee Reynolds ... Constant Leung
28 Oct 2014
TESOL Quarterly | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A model for estimating the occurrence of same-frequency words and the boundary between high- and low-frequency words in texts

Abstract

Talk to us

Similar Papers

More From: Journal of the American Society for Information Science. American Society for Information Science