Investigating whether a flemma count is a more distinctive measurement of lexical diversity

Thwin Myint Myint Maw,Jon Clenton,George Higginbotham

doi:10.1016/j.asw.2022.100640

Abstract

Lexical diversity (LD) measures (i.e., vocabulary range deployed in a written or spoken sample) have been shown to predict L2 language proficiency. Treffers-Daller et al. (2018), however, suggest that the analysis unit influences LD measures’ predictability of language proficiency and highlight the greater impact of lemma count on LD measurement. Despite evidence of lemma count usefulness, no single LD study has empirically examined a flemma count. We therefore partially replicate Treffers-Daller et al. to explore potential flemma count influences on LD measure writing predictability, compared to simple and lemma counts. We analyzed 105 Chinese L2 learner IELTS essays, completed at a UK university. We computed LD scores for non-lemmatized, lemmatized and flemmatized texts using three basic LD measures ( Types, T TR, Guiraud’s Index ), and three sophisticated measures ( D, MTLD, HD-D ). Results suggest that both flemmatization and lemmatization influenced LD scores and measures. LD measure predictability is dependent on the analysis unit. All three basic measures and D were reliable writing indicators, based on flemma and lemma counts, whereas HD-D was a better writing predictor once simple and lemma counts were applied. However, MTLD failed to predict any writing level. We conclude that different analysis units have different influences on LD measures. • Lemmatization vs flemmatization is important in lexical diversity assessment. • Lexical diversity measures are important indicators of writing proficiency. • Lexical diversity measures’ predictability is dependent on the word counting criterion. • Flemma and lemma counts are more distinctive lexical units than simple count.

Full Text