Designing, evaluating, and deploying automated scoring systems with validity in mind: Methodological design decisions

André A Rupp

doi:10.1080/08957347.2018.1464448

Abstract

ABSTRACTThis article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the principles have implications for principled reasoning and workflow management for other use contexts. The overall workflow is described as a series of five phases, each one having two critical sub-phases with a large number of associated methodological design decisions. These phases involve assessment design, linguistic component design, model design, model validation, and operational deployment. Through brief examples, the various considerations for these design decisions are illustrated, which have to be carefully weighed in the overall decision-making process for the system in order to unveil the complexities that underlie this work. The article closes with reflections on resource demands as well as recommendations for best practices of interdisciplinary teams who engage in this work, underscoring how this work is a blend of scientific rigor and artful practice.

Full Text