Abstract

The present study reports the process of development and validation of a rating scale in the Iranian EFL academic writing assessment context. To achieve this goal, the study was conducted in three distinct phases. Early in the study, the researcher interviewed a number of raters in different universities. Next, a questionnaire was developed based on the results of the interview along with the related literature. Later, the questionnaire was sent to thirty experienced raters from ten major state universities in Iran. Results of the country-wide survey in this phase showed that there was no objective scale in use by the raters in the context. Therefore, in the second development phase of the study, fifteen of the raters who participated in the first phase were asked to verbalize their thoughts when each rating five essays. At the end of this phase, a first draft of the scale was developed. Finally, in the last validation phase of the study, ten raters were asked to each rate a body of twenty essays using the newly developed scale. Next, eight of the raters participated in a follow-up retrospective interview. The analysis of the raters’ performance using FACETS showed high profile of reliability and validity for the new scale. In addition, while the qualitative findings of the interviews counted some problems with the structure of the scale, on the whole, the findings showed that the introduction of the scale was well-received by the raters. The pedagogical implications of the study will be discussed. In addition, the study calls for further validation of the scale in the context.

Highlights

  • In performance-based assessment, scoring rubrics are important as they show the construct to be performed and measured

  • Investigating the first research question Through careful investigation of the items which explored the existence of a rating scale among the Iranian English as a foreign language (EFL) raters, it was found that the raters who contributed to this study doubted the existence of an objective rating scale in their rating practice

  • While a substantial number of raters disagreed with an impressionistic approach to scoring (56.66%, item 15) and strongly believed that all raters had some criteria in their scoring (80%, item 17), they held differing attitudes about a common rating scale in their own rating practice

Read more

Summary

Introduction

In performance-based assessment, scoring rubrics (variously named as rating scales or marking schemes) are important as they show the construct to be performed and measured. 2010, p.43) and can reduce the long-recognized problem of rater variability (Bachman, Lynch, & Mason, 1995; McNamara, 1996). As a result, they promote reliable test scores and valid score inferences (Boettger, 2010; Crusan, 2010; Crusan, 2015; Dempsey, PytlikZillig, & Bruning, 2009; Knoch, 2009; Lukácsi, 2020; Rakedzon & Baram-Tsabari, 2017). McNamara (1996, p.182) asserts that in the field of language assessment, “we are frequently presented with rating scales as products for consumption and are told little of their provenance and of their rationale. The ad hoc rubrics developed in this way can help teachers with the classroom assessment, for more high-stakes tests with significant impacts on the educational life of stake-holders, the rubrics grounded in theory are needed (Knoch, 2011; McNamara, 1996)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call