Abstract

Using generalizability (G-) theory and rater think-aloud protocols (TAPs) as research methods, this study examined the effects of person, task, rater, and the interactions among these facets on the variability and reliability of the HSK-6 (i.e., an international Chinese proficiency standardized assessment) writing scores assigned by the national HSK writing raters as well as their scoring decision making processes. Sixty-four HSK-6 writing samples written by 32 CFL (Chinese as a foreign language) learners from 17 L1 (first language) backgrounds were scored holistically by ten experienced HSK writing raters using the authentic HSK-6 scoring rubric. They were then invited to produce a written retrospective TAP of their scoring decision making processes immediately after they had completed scoring each HSK-6 writing sample, which resulted in 64 protocols per rater. A total of 640 protocols were included in the qualitative data analysis. The G-theory results indicated that the current single-task and two-rater holistic scoring scheme would be unable to yield acceptable generalizability and dependability coefficients. The rater TAP results also revealed considerable rater variations in their scoring decision making processes. Important implications for the HSK-6 writing assessment policy makers in China are discussed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.