The affectability of writing assessment scores: a G-theory analysis of rater, task, and scoring method contribution

Ali Khodi

doi:10.1186/s40468-021-00134-5

Abstract

The present study attempted to to investigate factors which affect EFL writing scores through using generalizability theory (G-theory). To this purpose, one hundred and twenty students participated in one independent and one integrated writing tasks. Proceeding, their performances were scored by six raters: one self-rating, three peers,-rating and two instructors-rating. The main purpose of the sudy was to determine the relative and absolute contributions of different facets such as student, rater, task, method of scoring, and background of education to the validity of writing assessment scores. The results indicated three major sources of variance: (a) the student by task by method of scoring (nested in background of education) interaction (STM:B) with 31.8% contribution to the total variance, (b) the student by rater by task by method of scoring (nested in background of education) interaction (SRTM:B) with 26.5% of contribution to the total variance, and (c) the student by rater by method of scoring (nested in background of education) interaction (SRM:B) with 17.6% of the contribution. With regard to the G-coefficients in G-study (relative G-coefficient ≥ 0.86), it was also found that the result of the assessment was highly valid and reliable. The sources of error variance were detected as the student by rater (nested in background of education) (SR:B) and rater by background of education with 99.2% and 0.8% contribution to the error variance, respectively. Additionally, ten separate G-studies were conducted to investigate the contribution of different facets across rater, task, and methods of scoring as differentiation facet. These studies suggested that peer rating, analytical scoring method, and integrated writing tasks were the most reliable and generalizable designs of the writing assessments. Finally, five decision-making studies (D-studies) in optimization level were conducted and it was indicated that at least four raters (with G-coefficient = 0.80) are necessary for a valid and reliable assessment. Based on these results, to achieve the greatest gain in generalizability, teachers should have their students take two writing assessments and their performance should be rated on at least two scoring methods by at least four raters.

Highlights

Nowadays, testing and assessment are integrated with the contemporary life and students around the world are assessed continually for two purposes: first, to examine their educational progress and, second, to evaluate the quality of educational systems (Fulcher & Davidson, 2007)
In the independent part of the study, students received a TOFEL writing topic and they were supposed to write a five-paragraph essay on that topic
For the integrated part of the procedure, students read a short reading at first in 10 min as an introductory part of the process, they were supposed to listen to the audio-track giving explanation and ideas against and for the topic and the students amalgamated their own ideas with those ideas mentioned in the reading and sound clip and work in the five-paragraph essays during 7 min

Summary

Introduction

Nowadays, testing and assessment are integrated with the contemporary life and students around the world are assessed continually for two purposes: first, to examine their educational progress and, second, to evaluate the quality of educational systems (Fulcher & Davidson, 2007). Writing assessment scores are associated with error for EFL learners more than native speakers (Huang, 2012) Generally, factors affecting students’ writing scores can be divided into two types: rater-related and task-related (Huang, 2011). Prior to the discussion of factors affecting writing assessment scores, it should be noted that different cultural backgrounds and linguistic abilities of students make the writing assessment a problematic issue. Many research studies have investigated these factors as the major sources of systematic error in ESL writings: raters’ linguistic and academic background, raters’ tolerance for errors, rater training, and types of writing tasks (Ferris, 1994; Song & Caruso, 1996; Weigle, 1998). In spite of the fact that the research in this regard (Mehrani & Khodi, 2014) explored factors that influence the writing assessment scores, the relative impacts of these factors are neglected

Objectives

Methods

Results

Conclusion