Abstract

In spontaneous speech data, lexical richness is generally operationalized by measures in which the relation between the number of types and tokens plays a role, of which the Type/Token Ratio (TTR) is the most famous. This article discusses the reliability and validity of different measures of lexical richness in various language data research and computer simulations, and examines the behaviour of these measures in spontaneous speech data of first language and second language children learning Dutch, aged four to seven, compared with their lexical abilities as measured by tests. The results show that neither the validity nor the reliability of the measures were satisfactory, especially the widely applied TTR. Initially, the number of types, or lemmas, and the Guiraud and Uber indexes seem to be adequate measures. However, in later stages of vocabulary acquisition (from 3000 words on) neither is valid. It is suggested that more effective measures of lexical richness might be based not on the distribution of or the relation between the types and tokens, but on the degree of difficulty of the words used, as measured by their (levels of) frequency in daily language input.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call