Abstract

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater indicates less intra-rater reliability than college teacher and high school teacher raters under the stringent infit limits. There’s no central tendency and random effect for both automatic and human raters. This research provides evidence for the automatic rating reform of the computerized English listening-speaking test (CELST) in Guangdong NMET and encourages the application of MFRM in actual score monitoring.

Highlights

  • In recent years, the National Matriculation English Test (NMET) in China pays increasing attention to the measurement of communicative ability and productive skills, one good embodiment is the Computerized English Listening and Speaking Test (CELST) issued in 2011 in Guangdong province

  • CELST meets the requirements of a good oral test, such as focusing on information exchange, creating contextualized situation and authenticity to incorporate interactiveness into language communication (Yang, 1999)

  • This research is designed with the purpose to compare the rating difference between computer automatic rater and expert teacher raters’ rating of CELST

Read more

Summary

Introduction

The National Matriculation English Test (NMET) in China pays increasing attention to the measurement of communicative ability and productive skills, one good embodiment is the Computerized English Listening and Speaking Test (CELST) issued in 2011 in Guangdong province. Rater effects are inevitable, especially the arduous rating process due to the large number of examinees and the need to realize quick turnover of test results. Computer automatic rating is advocated, hoping to save time and energy, raise the level of fairness and eliminate subjective human rating errors and to obtain test results more efficiently. There is an urgent need to examine the appropriateness and practicality of automatic rating in replacing human rating. This research is designed with the purpose to compare the rating difference between computer automatic rater and expert teacher raters’ rating of CELST

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.