Rating Performance among Raters of Different Experience Through Multi-Facet Rasch Measurement (MFRM) Model

Mohd Effendi Ewan Mohd Matore,Muhamad Firdaus Mohd Noh

doi:10.21031/epod.662964

Abstract

One’s experience can greatly contribute to a diversified rating performance in educational scoring. Heterogeneous ratings can negatively affect examinees’ results. The aim of the study is to examine raters’ rating performance in assessing oral tests among lower secondary school students using Multi-facet Rasch Measurement (MFRM) model indicated by raters’ severity. Respondents are thirty English Language teachers clustered into two groups based on their rating experience in high-stakes assessment. The respondents listened to ten examinees’ recorded answers of three oral test items and provided their ratings. Instruments include items, examinees’ answers, scoring rubric, and scoring sheet used to appraise examinees’ competence in three domains which are vocabulary, grammar, and communicative competence. MFRM analysis showed that raters exhibited diversity in their severity level with chi-square χ2=2.661. Raters’ severity measures ranged from 2.13 to -1.45 logits. Independent t-test indicated that there was a significant difference in ratings provided by the inexperienced and the experienced raters, t-value = -0.96, df = 28, p&lt;0.01. The findings of this study suggest that assessment developers must ensure raters are well versed before they can rate examinees in operational settings gained through assessment practices or rater training. Further research is needed to account for the varying effects of rating experience in other assessment contexts and the effects of interaction between facets on estimates of examinees’ measures. The present study provides additional evidence with respect to the role of rating experience in inspiring raters to provide accurate ratings.

Highlights

Rater-mediated assessment is among the types of ubiquitous assessments in the education system around the world
The present study was designed to determine rating performance between inexperienced and experienced raters within the context of oral tests in addition to confirming findings observed from previous studies despite being conducted in different contexts
Through the analysis of Multi-Facet Rasch Measurement (MFRM), one of the significant findings emerged from this study was that raters with different experiences showed nonuniform severity level whereas, the experienced raters displayed more consistency than the inexperienced raters

Summary

Introduction

Rater-mediated assessment is among the types of ubiquitous assessments in the education system around the world. Rater-mediated assessment is indispensable in high-stakes assessment to appraise examinees’ competence in complex traits such as speaking skill, writing skill, and art in order to screen examinees for essential selections such as university enrolment and job interview. The use of raters in assessing examinees’ competence within the context of high-stakes assessment brings impact on examinees’ final marks (Engelhard & Wind, 2018). This impact, known as the rater effect, is systematically attributed to raters’ variability and results in variances in observed ratings (Scullen, Mount & Goff, 2000). Examinees receive marks deviated far from their actual proficiency in the assessed domains (Myford & Wolfe, 2003).

Objectives

Methods

Results

Conclusion