Automated essay scoring (AES) of constructed responses in nursing examinations: An evaluation

Tracey C Stephen,Mark C Gierl,Sharla King

doi:10.1016/j.nepr.2021.103085

Abstract

Nursing students’ higher-level thinking skills are ideally assessed through constructed-response items. At the baccalaureate level in North America, however, this exam format has largely fallen into disuse owing to the labor-intensive process of scoring written exam papers. The authors sought to determine if automated essay scoring (AES) would be an efficient and reliable alternative to human scoring. Four constructed-response exam items were administered to an initial cohort of 359 undergraduate nursing students in 2016 and to a second cohort of 40 students in 2018. The items were graded by two human raters (HR1 & HR2) and an AES software platform. AES approximated or surpassed agreement and reliability measures achieved by the HR1 and HR2 with each other, and AES surpassed both human raters in efficiency. A list of answer keywords was created to increase the efficiency and reliability of AES. Low agreement between human raters may be explained by rater drift and fatigue, and shortcomings in the development of Item 1 may have reduced its overall agreement and reliability measures. It can be concluded that AES is a reliable and cost-effective means of scoring constructed-response nursing examinations, but further studies employing greater sample sizes are needed to establish this definitively.

Full Text