Abstract

Objective Structured Clinical Examinations (OSCEs) are used in a variety of high-stakes examinations. The primary goal of this study was to examine factors influencing the variability of assessment scores for mock OSCEs administered to senior anesthesiology residents. Using the American Board of Anesthesiology (ABA) OSCE Content Outline as a blueprint, scenarios were developed for 4 of the ABA skill types: (1) informed consent, (2) treatment options, (3) interpretation of echocardiograms, and (4) application of ultrasonography. Eight residency programs administered these 4 OSCEs to CA3 residents during a 1-day formative session. A global score and checklist items were used for scoring by faculty raters. We used a statistical framework called generalizability theory, or G-theory, to estimate the sources of variation (or facets), and to estimate the reliability (ie, reproducibility) of the OSCE performance scores. Reliability provides a metric on the consistency or reproducibility of learner performance as measured through the assessment. Of the 115 total eligible senior residents, 99 participated in the OSCE because the other residents were unavailable. Overall, residents correctly performed 84% (standard deviation [SD] 16%, range 38%-100%) of the 36 total checklist items for the 4 OSCEs. On global scoring, the pass rate for the informed consent station was 71%, for treatment options was 97%, for interpretation of echocardiograms was 66%, and for application of ultrasound was 72%. The estimate of reliability expressing the reproducibility of examinee rankings equaled 0.56 (95% confidence interval [CI], 0.49-0.63), which is reasonable for normative assessments that aim to compare a resident's performance relative to other residents because over half of the observed variation in total scores is due to variation in examinee ability. Phi coefficient reliability of 0.42 (95% CI, 0.35-0.50) indicates that criterion-based judgments (eg, pass-fail status) cannot be made. Phi expresses the absolute consistency of a score and reflects how closely the assessment is likely to reproduce an examinee's final score. Overall, the greatest (14.6%) variance was due to the person by item by station interaction (3-way interaction) indicating that specific residents did well on some items but poorly on other items. The variance (11.2%) due to residency programs across case items was high suggesting moderate variability in performance from residents during the OSCEs among residency programs. Since many residency programs aim to develop their own mock OSCEs, this study provides evidence that it is possible for programs to create a meaningful mock OSCE experience that is statistically reliable for separating resident performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call