대규모 평가 서답형 문항 채점을 위한 문장 수준 자동채점 프로그램의 정확성 분석1)

Mi-Young Song,Eun-Hee Noh,Kyung-Hee Sung

doi:10.29221/jce.2016.19.1.255

Abstract

In order to effectively manage the scoring of constructed-response (CR) items for Korean large-scale assessments, this study aims to implement the automatic scoring (AS) system for sentence-level responses based on the prototype designed in 2014 and to score CR items for the National Assessment of Educational Achievement (NAEA) 2014 using the AS system. We scored answers to six CR items of Korean language, social studies, and science in the NAEA 2014 using the AS system for sentence-level responses. The scores from the AS system were highly consistent with the scores by human scoring showing 96.1 to 99.7% of the exact agreement and 0.82 to 0.99 of correlation coefficients between the scores from two scoring methods. The exact agreements for the AS system this year were higher than those for the prototype. This result means that the performance of the AS system has been improved. The AS program for Korean CR items is designed a human-machine collaborative and stepwise scoring method to guarantee the accuracy of scoring. This study provides evidence that automated scoring might be a reliable and efficient and could serve as a useful complement to human scoring for a large-scale assessment.

Full Text