Abstract

Lexical frequency benchmarks have been extensively used to investigate second language (L2) lexical sophistication, especially in language assessment studies. However, indices based on semantic co-occurrence, which may be a better representation of the experience language users have with lexical items, have not been sufficiently tested as benchmarks of lexical sophistication. To address this gap, we developed and tested indices based on semantic co-occurrence from two computational methods, namely, Latent Semantic Analysis and Word2Vec. The indices were developed from one L2 written corpus (i.e., EF Cambridge Open Language Database [EF-CAMDAT]) and one first language (L1) written corpus (i.e., Corpus of Contemporary American English [COCA] Magazine). Available L1 semantic context indices (i.e., Touchstone Applied Sciences Associates [TASA] indices) were also assessed. To validate the indices, they were used to predict L2 essay quality scores as judged by human raters. The models suggested that the semantic context indices developed from EF-CAMDAT and TASA, but not the COCA Magazine indices, explained unique variance in the presence of lexical sophistication measures. This study suggests that semantic context indices based on multi-level corpora, including L2 corpora, may provide a useful representation of the experience L2 writers have with input, which may assist with automatic scoring of L2 writing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.