Abstract

Based on the Chinese-English Sentence-Aligned Bilingual Corpus constructed by Institute of Automation and Institute of Computing Technology of Chinese Academy of Sciences, this paper investigates the Chinese and English inter-textual type/token relationship and tests the fitness of BRUNET’s model to the vocabulary growth curves of Chinese and English texts, and it also explores the growth patterns of hapax legomena in the two languages. Results of the study show that Chinese and English vocabulary growth displays a similar sharp-slow increasing tendency, but initially with the Chinese types rising more sharply than those of English; and BRUNET’s model is powerful enough to match both Chinese and English inter-textual type/token relationship. This study also finds that there are far fewer hapax legomena in Chinese than in English, and with the increase of tokens, the hapax legomena in the two languages both display a growth pattern similar to that of their type/token relationship. But from the cross point (about 2,500,000 cumulative word tokens) downwards, the cumulative number of Chinese hapax legomena has become much smaller than that of English.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call