Abstract

While word-frequency lists have been commonly used as indexes of word usefulness, their role as a proxy for learner word knowledge is unclear. Word knowledge in a structured sample (N = 625) of Japanese university-level EFL learners, operationalized using dichotomous Rasch modeling of test-item data, was used as an external reference criterion to investigate two issues germane to the development of word lists representing learner knowledge in EFL contexts: 1) the definition of word and 2) the choice of reference corpus. On the former, corpus-derived, word-frequency lists based on either word orthographic forms, flemmas, or word families were generated from 18 different corpora. Word-frequency lists using flemma-based word groupings resulted in higher correlations with learner population word knowledge as compared with those using word-family-based groupings across all 18 sets of word lists tested. On the latter, lists derived from corpora of spontaneous speech, fictional TV/movies for younger viewers, and narrative written texts consistently showed higher correlations with word knowledge than those derived from non-conversational speech, or any non-fiction written text genre. These results suggest that mega-corpora compiled from conveniently available electronic written texts may not be ideal as scales for diagnostic vocabulary testing or as indexes used in readability formulae.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call