Prediction of Writing True Scores in Automated Scoring of Essays by Best Linear Predictors and Penalized Best Linear Predictors

Lili Yao,Mo Zhang,Shelby J Haberman

doi:10.1002/ets2.12248

Abstract

AbstractMany assessments of writing proficiency that aid in making high‐stakes decisions consist of several essay tasks evaluated by a combination of human holistic scores and computer‐generated scores for essay features such as the rate of grammatical errors per word. Under typical conditions, a summary writing score is provided by a linear combination of the holistic scores and the feature scores. The best linear predictor (BLP) is used to approximate the true composite writing score by a linear combination of holistic scores and scores of essay features. However, because the relationship between computer‐generated feature score and human scores may depend on subgroup membership and the same scoring rules must normally be applied to all test takers, Yao, Haberman, and Zhang proposed a modified methodology of the penalized best linear predictor (PBLP) by incorporating a quadratic penalty function into the conventional BLP method. This research report contains full accounts of the BLP results as well as supplementary PBLP results to Yao et al. for three assessments of writing that aid in making high‐stakes decisions: the TOEFL iBT® Writing test, the GRE® General Analytical Writing subject test, and the Praxis® Core Academic Skills for Educators: Writing assessment. Results obtained indicate the added value in using machine features for prediction of composite true scores of essay writings and effectiveness of the penalty function in suppressing the lack of population invariance.

Full Text