Simulation-Based Examinations in Physician Assistant Education: A Comparison of Two Standard-Setting Methods

Jim Carlson,John Tomkowiak,Patrick Knott

doi:10.1097/01367895-201021020-00002

Abstract

this study explored the reliability of two simple standard-setting methods that are used to set passing standards for a standardized patient (SP) exam in physician assistant (PA) education. fifty-four second-year PA students participated in a multistation SP-based clinical skills exam. Cut scores were set using the Angoff and Borderline Group methods. A panel of PA faculty set cut scores using the Angoff method. A modified version of the Borderline Group method set cut scores using SP global ratings verified by faculty review. Inter-rater reliability between judges was evaluated using kappa coefficient (k) for the Angoff method and intraclass correlation coefficient (ICC) for the Borderline Group method. the Borderline Group method set an overall cut score for the exam of 76% (95% CI +/- 5) and the Angoff method set a cut score at 62% (95% CI +/- 9). Both methods demonstrated sufficient inter-rater reliability (k 0.60, ICC > 0.70; both significant at p < 0.05), although one case (preop history and physical) demonstrated poor inter-rater reliability between judges using the Borderline Group method. the Borderline Group method offered a slightly more reliable cut score when compared to the standard set by the Angoff method, but was more challenging to implement. In addition, one case demonstrated poor inter-rater reliability with the Borderline Group method. Using SPs to complete global borderline ratings offers one solution to make the Borderline Group method more feasibile, but requires a high degree of initial rater calibration and periodic measures of interrater reliability between faculty and SPs.

Full Text