Abstract

The current study presents a New General Service List (new-GSL), which is a result of robust comparison of four language corpora (LOB, BNC, BE06, and EnTenTen12) of the total size of over 12 billion running words. The four corpora were selected to represent a variety of corpus sizes and approaches to representativeness and sampling. In particular, the study investigates the lexical overlap among the corpora in the top 3,000 words based on the average reduced frequency (ARF), which is a measure that takes into consideration both frequency and dispersion of lexical items. The results show that there exists a stable vocabulary core of 2,122 items (70.7%) among the four corpora. Moreover, these vocabulary items occur with comparable ranks in the individual wordlists. In producing the new-GSL, the core vocabulary items were combined with new items frequently occurring in the corpora representing current language use (BE06 and EnTenTen12). The final product of the study, the new-GSL, consists of 2,494 lemmas and covers between 80.1 and 81.7 per cent of the text in the source corpora.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.