On the Validity of Machine Learning-based Next Generation Science Assessments: A Validity Inferential Network

Xiaoming Zhai,James W Pellegrino,Joseph Krajcik

doi:10.1007/s10956-020-09879-9

Abstract

This study provides a solid validity inferential network to guide the development, interpretation, and use of machine learning-based next-generation science assessments (NGSAs). Given that machine learning (ML) has been broadly implemented in the automatic scoring of constructed responses, essays, simulations, educational games, and interdisciplinary assessments to advance the evidence collection and inference of student science learning, we contend that additional validity issues arise for science assessments due to the involvement of ML. These emerging validity issues may not be addressed by prior validity frameworks developed for either non-science or non-ML assessments. We thus examine the changes brought in by ML to science assessments and identify seven critical validity issues of ML-based NGSAs: potential risk of misrepresenting the construct of interest, potential confounders due to that more variables may involve, nonalignment between interpretation and use of scores and designed learning goals, nonalignment between interpretation and use of scores and actual learning quality, nonalignment between machine scores and rubrics, limited generalizable ability of machine algorithmic models, and limited extrapolating ability of machine algorithmic models. Based on the seven validity issues identified, we propose a validity inferential network to address the cognitive, instructional, and inferential validity of ML-based NGSAs. To demonstrate the utility of this network, we present an exemplar of ML-based next-generation science assessments that was developed using a seven-step ML framework. We articulate how we used the validity inferential network to ensure accountable assessment design, as well as valid interpretation and use of machine scores.

Full Text