Exploring assessor cognition as a source of score variability in a performance assessment of practice-based competencies

Mary Roduta Roberts,Megan Cook,Iris C I Chao

doi:10.1186/s12909-020-02077-6

Mary Roduta Roberts, Megan Cook + Show 1 more

Open Access

https://doi.org/10.1186/s12909-020-02077-6

Copy DOI

Journal: BMC Medical Education	Publication Date: May 25, 2020
Citations: 4	License type: open-access

Affiliation: University of Alberta

Abstract

BackgroundA common feature of performance assessments is the use of human assessors to render judgements on student performance. From a measurement perspective, variability among assessors when assessing students may be viewed as a concern because it negatively impacts score reliability and validity. However, from a contextual perspective, variability among assessors is considered both meaningful and expected. A qualitative examination of assessor cognition when assessing student performance can assist in exploring what components are amenable to improvement through enhanced rater training, and the extent of variability when viewing assessors as contributing their individual expertise. Therefore, the purpose of this study was to explore assessor cognition as a source of score variability in a performance assessment of practice-based competencies.MethodA mixed-method sequential explanatory study design was used where findings from the qualitative strand assisted in the interpretation of results from the quantitative strand. Scores from one objective structured clinical examination (OSCE) were obtained for 95 occupational therapy students. Two Generalizability studies were conducted to examine the relative contribution of assessors as a source of score variability and to estimate the reliability of domain and holistic scores. Think-aloud interviews were conducted with eight participants assessing a subset of student performances from the OSCE in which they participated. Findings from the analysis of think-aloud data and consideration of assessors’ background characteristics were used to assist in the interpretation of variance component estimates involving assessors, and score reliability.ResultsResults from two generalizability analyses indicated the highest-order interaction-error term involving assessors accounted for the second-highest proportion of variance, after student variation. Score reliability was higher in the holistic vs. analytic scoring framework. Verbal analysis of assessors' think-aloud interviews provided evidential support for the quantitative results.ConclusionsThis study provides insight into the nature and extent of assessor variability during a performance assessment of practice-based competencies. Study findings are interpretable from the measurement and contextual perspectives on assessor cognition. An integrated understanding is important to elucidate the meaning underlying the numerical score because the defensibility of inferences made about students’ proficiencies rely on score quality, which in turn relies on expert judgements.

Highlights

A common feature of performance assessments is the use of human assessors to render judgements on student performance
Study findings are interpretable from the measurement and contextual perspectives on assessor cognition
Previous studies have identified different perspectives on assessor cognition by which to view the causes of assessor variability, including “assessor as trainable” and “assessor as meaningfully idiosyncratic” [3]

Summary

Introduction

A common feature of performance assessments is the use of human assessors to render judgements on student performance. Variability among assessors when assessing students may be viewed as a concern because it negatively impacts score reliability and validity. A qualitative examination of assessor cognition when assessing student performance can assist in exploring what components are amenable to improvement through enhanced rater training, and the extent of variability when viewing assessors as contributing their individual expertise. Performance assessments can provide standardized rating schemes for assessors to evaluate students’ competencies, the subjectivity of human judgements introduces the potential for inter-assessor score variability that negatively impacts on assessment quality [2]. Previous studies have identified different perspectives on assessor cognition by which to view the causes of assessor variability, including “assessor as trainable” (aligning with a measurement perspective) and “assessor as meaningfully idiosyncratic” (aligning with a contextual perspective) [3]. The contextual perspective views that assessors are meaningfully distinctive, where different inferences and judgements among expert assessors can support a more holistic and context-dependent interpretation of a student’s performance

Objectives

Methods

Results

Discussion

Conclusion