Abstract

I studied rater effects in the writing and speaking sections of the Test of German as a Foreign Language (TestDaF). Building on the many-facet Rasch measurement methodology, the focus was on rater main effects as well as 2- and 3-way interactions between raters and the other facets involved, that is, examinees, rating criteria (in the writing section), and tasks (in the speaking section). Another goal was to investigate differential rater functioning related to examinee gender. Results showed that raters (a) differed strongly in the severity with which they rated examinees; (b) were fairly consistent in their overall ratings; (c) were substantially less consistent in relation to rating criteria (or speaking tasks, respectively) than in relation to examinees; and (d) as a group, were not subject to gender bias. These findings have implications for controlling and assuring the psychometric quality of the TestDaF rater-mediated assessment system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.