Abstract
Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners' use of language and focus on specific elements rather than global abilities. However, rating scales used in performance assessment have been repeatedly criticized for being imprecise and therefore often resulting in holistic marking by raters (Weigle, 2002). The aim of this study is to compare two rating scales for writing in an EAP context; one `a priori' developed scale with less specific descriptors of the kind commonly used in proficiency tests and one empirically developed scale with detailed level descriptors. The validation process involved 10 trained raters applying both sets of descriptors to the rating of 100 writing scripts yielded from a large-scale diagnostic assessment administered to both native and non-native speakers of English at a large university. A quantitative comparison of rater behaviour was undertaken using FACETS. Questionnaires and interviews were administered to elicit the raters' perceptions of the efficacy of the two types of scales. The results indicate that rater reliability was substantially higher and that raters were able to better distinguish between different aspects of writing when the more detailed descriptors were used. Rater feedback also showed a preference for the more detailed scale. The findings are discussed in terms of their implications for rater training and rating scale development.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.