Abstract

Expanding paradigms of language learning and testing prompt the need for developing objective methods of assessing language proficiency from spontaneous speech. In this paper new measures of syntactic complexity for use in the framework of automatic scoring systems for second language spontaneous speech, are studied. In contrast to most existing measures that estimate competence levels indirectly based on the length of production units or frequency of specific grammatical structures, we capture the differences in the distribution of morpho-syntactic features across learners’ proficiency levels. We build score-specific models of part of speech (POS) tag distribution from a large corpus of spontaneous second language English utterances and use them to measure syntactic complexity.Given a speaker’s response, we consider its similarity with a set of utterances scored for proficiency by humans. The comparison is made by considering the distribution of POS tags in the response and a score-level. The underlying distribution of POS tags (indicative of syntactic complexity) is represented via two models: a vector-space model and a language model.Empirical results suggest that the proposed measures of syntactic complexity show a reasonable association with human-rated proficiency scores compared to conventional measures of syntactic complexity. They are also significantly robust against errors resulting from automatic speech recognition, making them more suitable for use in operational automated scoring applications. When used in combination with other measures of oral proficiency in a state-of-the-art scoring model, the predicted scores show improved agreement with human-assigned scores over a baseline scoring model without our proposed features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call