Validity evidences for scoring procedures of a writing assessment task. A case study on consistency, reliability, unidimensionality and prediction accuracy

Paula Elosua

doi:10.1016/j.asw.2022.100669

Abstract

Scoring is a fundamental step in the assessment of writing performance. The choice of the scoring procedure as well as the adoption of a discrepancy resolution method can impact the psychometric properties of the scores and therefore the final pass/fail decision. In a comprehensive framework which considers scoring as part of the validation process of the scores, the aim of this paper is to evaluate the impact of rater mean, parity and tertium quid procedures on score properties. Using data from a writing assessment task applied in a professional context, the paper analyses score reliability, dependability, unidimensionality and decision accuracy on two sets of data; complete data and subsample of discrepant data. The results show better performance of the tertium quid procedure in terms of reliability indicators but a lower quality in defining construct unidimensionality. • Operational scoring methods analysis in writing assessment contributes to the validation process of the scores. • Validity and reliability have many trade-offs in the assessment of writing performance. • The tertium quid procedure resolution method impacts positively on the reliability of the scores. • The tertium quid score resolution method can generate systematic bias in the construct definition.

Full Text