Abstract

While numerous articles on Criterion® have been published and its validity evidence has accumulated, test users need to obtain relevant validity evidence for their local context and develop their own validity argument. This paper aims to provide validity evidence for the interpretation and use of Criterion® for assessing second language (L2) writing proficiency at a university in Japan. We focused on three perspectives: (a) differences in the difficulty of prompts in terms of Criterion® holistic scores, (b) relationships between Criterion® holistic scores and indicators of L2 proficiency, and (c) changes in Criterion® holistic and writing quality scores at three time points over 28 weeks. We used Rasch analysis (to examine (a)), Pearson product–moment correlations (to examine (b)), and multilevel modeling (to examine (c)). First, we found statistically significant but minor differences in prompt difficulty. Second, Criterion® holistic scores were found to be relatively weakly but positively correlated with indicators of L2 proficiency. Third, Criterion® holistic and writing quality scores—particularly, essay length and syntactic complexity—significantly improved, and thus are sensitive measures of the longitudinal development of L2 writing. All the results can be used as backing (i.e., positive evidence) for validity when we interpret Criterion® holistic scores as reflecting L2 writing proficiency and use the scores to detect gains in L2 writing proficiency. All of these results help to accumulate validity evidence for an overall validity argument in our context.

Highlights

  • While numerous articles on Criterion® have been published and its validity evidence has accumulated, test users need to obtain relevant validity evidence for their local context and develop their own validity argument

  • Most of the correlations were relatively weak but positive, including the correlation between Criterion® holistic scores in Time 3 and Test of English as a Foreign Language (TOEFL) iBT® writing scores obtained in a similar period to Time 3 (r = .34; 95 % confidence intervals (CIs) = .13, .52)

  • The correlations were rather weak but positive, and we consider this as positive evidence. This is because regarding relationships between Criterion® holistic scores and TOEFL iBT® writing scores, one of the two tasks in the TOEFL iBT® writing was an integrated task whose features differed from Criterion® task

Read more

Summary

Introduction

While numerous articles on Criterion® have been published and its validity evidence has accumulated, test users need to obtain relevant validity evidence for their local context and develop their own validity argument. This paper aims to provide validity evidence for the interpretation and use of Criterion® for assessing second language (L2) writing proficiency at a university in Japan. Previous studies have accumulated multiple pieces of validity evidence for the interpretation and use of Criterion®, validity evidence for local users is essential to interpret and use test scores in a meaningful way. We intend to provide such evidence in the context of assessing writing proficiency at a university in Japan. Koizumi et al Language Testing in Asia (2016) 6:5

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.