Abstract Degrees, unlike entities or events, refer to comparative qualities and are closely tied to gradable adjectives such as “tall.” Degree expressions have been explored in second language (L2) research, covering areas such as learnability, first language (L1) transfer, contrastive analysis, and acquisition difficulty. However, a computational approach to the learning of degree expressions in L2 contexts, particularly for L1 Chinese learners of English, has not been thoroughly investigated. This study aims to fill this gap by utilizing natural language processing (NLP) methods, drawing insights from recent advancements in large language models (LLMs). This study extends Cong (2024)’s general-purpose assessment pipeline to specifically analyze degree expressions, predicting that surprisal metrics will correlate with proficiency levels and distinct developmental stages of L2 learners. Crucially, we address the limitations of surprisal metrics in capturing underuse or avoidance—common in L2 development—by integrating frequency-based analyses. Using an NLP pipeline developed with Stanza, we automatically identified and analyzed degree expressions, constructing linear mixed-effects models to track L2 development trajectories. Our findings reveal that as proficiency increases, learners use complex degree expressions more frequently, supporting theories linking difficulty and learnability. Higher surprisal values are associated with lower proficiency in using degree expressions, and these surprisals are more predictive of degree expressions proficiency than classic NLP measures. These results add further evidence that LLMs and NLP tools provide valuable insights into L2 development, specifically in the domain of degree expressions, expanding upon previous research and offering new approaches for understanding L2 learning processes.
Read full abstract