Investigation of Interrater Reliability in The Evaluation of Foreign Language Writing Skills With Multigroup Confirmatory Factor Analysis

Emine Önen,Melike Kübra Taşdelen Yayvak

doi:10.11114/jets.v7i1.3421

Abstract

In this study, it was aimed to examine the interrater reliability of the scoring of paragraph writing skills on foreign languages with the measurement invariance tests. The study group consists of 267 students studying English at the Preparatory School at Gazi University. In the study, where students write a paragraph on the same topic, the paragraphs are rated separately by three different interrater using the same scoring key. The evidence for the validity measurements was collected with AFA and DFA while the evidence for the reliability measurements was collected by the Cronbach-alpha (α) coefficient. As a result of testing with Multi-Group Confirmatory Factor Analysis within the context of the measurement invariance of the interrater reliability, no evidence of full and partial scalar invariance can be obtained while evidence of formal configural and metric invariance is obtained. As a result, the lack of evidence of scalar invariance means that raters scoring the writing skills do not use the same initial level of performance. In this case, the invariant uniqueness and invariant factor variances could not be tested, and therefore no evidence of reliability between raters could be obtained.

Highlights

In this study, it was aimed to examine the interrater reliability of the scoring of paragraph writing skills on foreign languages with the measurement invariance tests
Paragraph unity is related to the basic elements of the paragraph, writing the topic sentence (WTS), writing the supporting sentence (WSS), writing example sentences (WES), and writing concluding sentence (WCS)
There is no evidence of scalar invariance in this study, where interrater reliability is examined through measurement invariance tests while providing evidence of configural and metric invariance

Summary

Introduction

It was aimed to examine the interrater reliability of the scoring of paragraph writing skills on foreign languages with the measurement invariance tests. The lack of evidence of scalar invariance means that raters scoring the writing skills do not use the same initial level of performance In this case, the invariant uniqueness and invariant factor variances could not be tested, and no evidence of reliability between raters could be obtained. In this context, measuring errors arising from rater is an important factor affecting the reliability of the scores in particular All of these aspects emphasize the importance of evaluating the same performance by different raters, that is to say, interrater reliability (Antonioni & Park, 2001; Attali, 2005). It is seen that the rater-derived variability in the scores is examined with the generalizability theory (Barkaoui, 2007; Elorbany & Huang, 2012; Kondo-Brown, 2002; Stuhlmann et al, 1999)

Methods

Results

Conclusion