Abstract

Rater variability has always been identified as an important source of measurement error in performance assessment, especially for oral proficiency tests. Rater training is commonly used as a means for compensating various sources of rater variability and adjusting their assessment quality. However, there is little research regarding the nature of training programs and raters’ perception using both a qualitative and a quantitative research design. Despite previous data on test takers’ reactions to oral test performance, there is little research regarding the application of test feedback and raters’ perceptions of the given feedback on their scoring performance and its probable usefulness on raters’ ratings. Twenty raters rated 300 test takers’ oral performances before and after a training program and their perceptions, attitudes, expectation, and evaluations were identified via questionnaires, interviews, and observations. The findings of qualitative and quantitative data analyses demonstrated that training programs are quite useful in satisfying their attitudes, perceptions, and evaluations about it. This will definitely result in the reduction of their severity and biases and increase in their consistency level. Besides, informing raters of the goals of performance assessment in training programs will result in less halo effect. Finally, those having positive attitudes toward rating feedback were able to incorporate it more successfully in their rating and thus achieved more consistency and less biasedness in their subsequent ratings. Consequently, decision-makers should not be concerned about raters’ expertise levels, but they should establish rater training programs to increase rater consistency and reduce their biases in measurement.

Highlights

  • The ability to speak in a second language is widely recognized as an important skill for educational, business, and personal reasons (Kim, 2015)

  • Rater variability has always been identified as an important source of measurement error in performance assessment, especially for oral proficiency tests

  • One issue which is at the heart of both reliability and validity in essay scoring is that of rater training (Fulcher, 2003)

Read more

Summary

Introduction

The ability to speak in a second language is widely recognized as an important skill for educational, business, and personal reasons (Kim, 2015). And in many oral performance tests, students’ performances are rated subjectively via employing a rating scale by the use of the descriptors based on which a rater assigns a scoring number (Kuiken & Vedder, 2014). Because such tests require subjective evaluations of speaking quality, a great deal of research emphasis has been placed on achieving an acceptable level of inter-rater reliability in order to show that spoken language can be scored as fairly and constantly as possible. A number of possible sources of rater disagreement have been studied and explored in the literature of speaking assessment (e.g. Brown, 2003; In’nami & Koizumi, 2016) which can have serious impact on raters’ assessment both positively and negatively

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call