Generalizability of Writing Scores and Language Program Placement Decisions: Score Dependability, Task Variability, and Score Profiles on an ESL Placement Test

Daniel Eskin

doi:10.52214/salt.v21i2.9056

Abstract

Second Language (L2) testing has increasingly relied on performance assessment to evaluate practical command of language acquired. However, such forms of assessment entail more complex task design and subjective human scoring judgment (Bachman, 2004), raising challenges for score dependability and score use due to variability associated with task design (Deville & Chalhoub-Deville, 2006; In’nami, & Koizumi, 2016), differences in rater behavior (Bachman, Lynch, & Mason, 1995), and rating rubric functionality, especially when consisting of multiple subscales (Grabowski & Lin, 2019; Sawaki, 2007, Xi, 2007). The current study illustrates the use of Multivariate Generalizability Theory (MG-Theory) analyses for examining score variability and dependability for written performance assessment on an ESL placement test, rated using an analytic rubric with three subscales. In particular, this study identified the presence of task-related variability that did reduce score dependability for the writing scores yielded from this test. By the same token, this variability could substantively be justified as an artifact of representing the construct of L2 writing ability in a sufficiently broad manner. Simply said, should we expect test takers to have equivalent levels of proficiency when writing a review of an experience as a customer and when writing an argumentative essay as a student?

Highlights

Second Language (L2) testing has increasingly relied on performance assessment to evaluate “practical command of language acquired” (McNamara, 1996 as cited in Grabowski & Lin, 2019, p. 54)
These challenges are more acute for less-resourced institutions using L2 performance assessments for making score-based interpretations about test-takers, such as a language program making decisions based on placement test scores (Bachman, et al, 1996; Sawaki & Xi, 2019; Vafaee & Yaghmaeyan, 2020)
The Language Control scale showed the least prominent mean difference across task (M = 3.05, SD = 0.98 for Task 1 and M = 2.78, SD = 1.07 for Task 2), but when submitted to an independent t-test still revealed a statistically significant difference, albeit not quite as pronounced as with the other subscales, M = 0.27, 95% CI [0.07, 0.47], t(414) = 2.684, p = .008. What these findings indicate is that the Content Control scale was interpreted more leniently on both tasks compared to the Organization Control scale and Language Control scale

Summary

Introduction

Second Language (L2) testing has increasingly relied on performance assessment (e.g., a written essay, a spoken monologue) to evaluate “practical command of language acquired” (McNamara, 1996 as cited in Grabowski & Lin, 2019, p. 54). For agencies that deliver high-stakes L2 proficiency exams (e.g., Educational Testing Service, ETS) a research agenda has been undertaken for years to examine the role of rater, task, and rubric as sources of variability into their performance assessments (Lee, 2006; Sawaki & Sinharay, 2013; Xi, 2007; Xi & Mollaun, 2006) These challenges are more acute for less-resourced institutions using L2 performance assessments for making score-based interpretations about test-takers, such as a language program making decisions based on placement test scores (Bachman, et al, 1996; Sawaki & Xi, 2019; Vafaee & Yaghmaeyan, 2020). The organization that administers the exam, the Community Language Program (CLP), operated within Teacher College, Columbia University’s in conjunction with its in Applied Linguistic and Teaching English to Speakers of Other Languages (TESOL), uses test scores from this exam to assign the test takers to a particular English as a Second Language (ESL) class at an appropriate proficiency level (i.e., beginner, intermediate, advanced)

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generalizability of Writing Scores and Language Program Placement Decisions: Score Dependability, Task Variability, and Score Profiles on an ESL Placement Test

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Studies in Applied Linguistics and TESOL

Lead the way for us

Journal: Studies in Applied Linguistics and TESOL	Publication Date: Jan 20, 2022
License type: CC BY 4.0

Similar Papers

You Can’t Drink a Word: Lexical and Individual Emotionality Affect Subjective Familiarity Judgments
Chris Westbury
Journal of Psycholinguistic Research | VOL. 43
Chris WestburyChris Westbury
24 Sep 2013
Journal of Psycholinguistic Research | VOL. 43

Human judgment biases and the teaching of management accounting
Joseph H Bylinski ... Chee W Chow
Journal of Accounting Education | VOL. 3
Joseph H Bylinski, et. al.Joseph H Bylinski ... Chee W Chow
01 Jan 1985
Journal of Accounting Education | VOL. 3

Fairness in human judgement in assessment: a hermeneutic literature review and conceptual framework.
Nyoli Valentine ... Steven Durning
Advances in health sciences education : theory and practice | VOL. 26
Nyoli Valentine, et. al.Nyoli Valentine ... Steven Durning
29 Oct 2020
Advances in health sciences education : theory and practice | VOL. 26

An intensity survey of households affected by the Northridge, California, earthquake of 17 January 1994
L A Dengler ... J W Dewey
Bulletin of the Seismological Society of America | VOL. 88
L A Dengler, et. al.L A Dengler ... J W Dewey
01 Apr 1998
An intensity survey of households affected by the Northridge, California, earthquake of 17 January 1994
L A Dengler ... J W Dewey

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generalizability of Writing Scores and Language Program Placement Decisions: Score Dependability, Task Variability, and Score Profiles on an ESL Placement Test

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Studies in Applied Linguistics and TESOL