Abstract

To validly assess teachers’ pedagogical content knowledge (PCK), performance-based tasks with open-response formats are required. Automated scoring is considered an appropriate ap-proach to reduce the resource-intensity of human scoring and to achieve more consistent scor-ing results than human raters. The focus is on the comparability of human and automated scor-ing of PCK for economics teachers. The answers of (prospective) teachers (N=852) to six open-response tasks from a standardized and validated test were scored by two trained human raters and the engine "Educational SCoRIng TOolkit" (ESCRITO). The average agreement between human and computer ratings, κw = .66, suggests a convergent validity of the scoring results. The results of the single-sector variance analysis show a significant influence of the answers for each homogeneous subgroup (students = 460, trainees = 230, in-service teachers = 162) on the automated scoring. Findings are discussed in terms of implications for the use of automated scoring in educational assessment and its potentials and limitations.

Highlights

  • Teaching a subject requires teachers to make the structure and meaning of the learning content accessible to learners, taking into account their individual learning prerequisites and needs (Kersting et al, 2014; Wilson et al, 2018)

  • Automated scoring is subdivided into the scoring of essays by “automated essay scoring” (AES) and the scoring of shortresponse texts by “automated short answer scoring” (ASAS) (Riordan et al, 2017)

  • With regard to the RQ1, the results show an almost complete agreement between the two human raters for all samples (2011: κw = 0.87; 2018: κw = 0.91; 2011/2018: κw = 0.89) (Table 3)

Read more

Summary

Introduction

Teaching a subject requires teachers to make the structure and meaning of the learning content accessible to learners, taking into account their individual learning prerequisites and needs (Kersting et al, 2014; Wilson et al, 2018). To validly assess PCK, performancebased tasks with open-response formats are required (Alonzo et al, 2012; Zlatkin-Troitschanskaia et al, 2019), where test takers can describe their instructional approaches to teaching situations (Shavelson, 2009; Liu et al, 2016). The scoring of open responses by human raters is a resourceintensive process (Dolan and Burling, 2012; Zhang, 2013) and can lead to inconsistencies in the test scores due to personal rater biases, which limits objective, reliable and valid measurement (Bejar, 2012; Liu et al, 2014). Automated scoring is considered an approach to reduce the resource intensity of scoring and achieve more consistent scoring results (Shermis et al, 2013; Zhang, 2013; Almond, 2014; Burrows et al, 2015). Differences between human and computer-based scorings may exist due to personal and dataset-related influences, for instance, gender or response length, or because of limitations of computer-based modeling (Bridgeman et al, 2012; Ramineni et al, 2012a,b; Perelman, 2014; Zehner et al, 2018)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.