The Use of Simulation Education in Competency Assessment: More Questions than Answers

Alice A Edler

doi:10.1097/01.anes.0000296648.16232.28

Abstract

Stanford University School of Medicine, Stanford, California. edlera@stanford.eduWith great interest many anesthesia educators have read three manuscripts in previous issues of Anesthesiology. Morgan, a well-known and respected researcher in simulation education, and her coauthors presented a thoughtful description of the use of simulation in the evaluation of teamwork in obstetrical practice.1However, in light of their conclusions that the “study does not support the use of Human Factors Rating Scale for assessment of obstetrical teams” and recommended limited use of the Global Rating Scale, taken along with Murray and Enarson’s reflective editorial2on the difficulties of teamwork and communication skills assessment, I fear that some anesthesia educators might be tempted to throw the newly born discipline of “simulation based assessment” out with the proverbial bathwater.Morgan et al. ’s investigation1raises several issues that need to be addressed in light of our urgent need to develop authentic teaching and assessment of clinical competency in anesthesiology.3The most pressing is our need for reliable and valid performance assessment tools used in anesthesia education, training, and practice.Morgan et al. found the Human Factors Rating Scale and Global Rating Scale assessment tools of limited reliability in the obstetrical setting; however, they do not further examine the sources of variance in the reliability other than from the raters themselves. Although the number of raters, the number of items, and the occasions of testing seem to be sufficient, we have no definitive analysis of this. Classic Test Theory and the use of interrater reliability and inter- and intraclass correlations for diagnosis of measurement error does not provide a view of the relative importance or interactions of these and other variances sources. Modern Test Theory, specifically Generalizability Theory, provides an analysis of multiple sources of variance and determination of optimal sampling not only for raters but also for subjects, items, and occasions of testing.4–6In this investigation,1the nonconcordance of the correlations of both Human Factors Rating Scale and Global Rating Scale suggest that something else is going on here. It could be the result of nonparallel scenarios, lack of rater training, or (more significantly), faulty construct validity of the Human Factors Rating Scale and Global Rating Scale for the anesthesia Crisis Resource Management trait. However, we do not know from this report.On reviewing of the original development of the Human Factors Rating Scale7and its 2000 revision, we still do not have formal psychometric analysis of its construct validity and factor status “due to limited sample size.”8Although its authors claim that the items cluster around “team roles, authority/command structure, stress recognition and organizational climate,”8these highly complex behaviors warrant more formal factor analysis before general use. We as anesthesiologists would not use a new clinical test without knowing the measure of its specificity and sensitivity; we should be equally rigorous with the degree of validity and reliability in high-stakes testing and use of resource-intense instruction. Modern Test Theory offers many advantages necessary for the authentic assessment of complex cognitive, technical, and behaviors skills in simulation-based education and performance assessment.9Stanford University School of Medicine, Stanford, California. edlera@stanford.edu

Full Text