Multivariate generalizability analysis of automated scoring for short answer items of social studies in large-scale assessment

Kyung Hee Sung,Kyong Hee Chon,Eun Hee Noh

doi:10.1007/s12564-017-9498-1

Abstract

With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in Applied Measurement in Education 6:103–118, 1993). In response to the scoring cost issues, various forms of automated system for scoring constructed response items have been developed and used. The purpose of this research is to provide a comprehensive analysis for the generalizability of automated scoring results and compare it to that of scores produced by human raters. The results of this study provide evidence supporting the argument that the automated scoring system offers outcomes nearly as reliable as those produced by human scoring. Based on these findings, the automated scoring system appears to be a promising alternative to human scoring particularly for short factual answer items.

Full Text