Abstract
is believed that the oral examination can be used to evaluate problem solving ability, communication and collaboration skills and expert content knowledge skills. In spite of its widespread acceptance, this examination has been criticized for its lack of reliability and validity and the high cost of its administration. Reliability is the measure of both the consistency and precision of a testing tool. The three main sources of variability (decreased reliability) in the oral examination process are: 1) examiner related variability; 2) examination related variability; and 3) candidate related variability. In their paper entitled “Poor interrater reliability on mock anesthesia oral examinations” in this edition of the Journal, Jacobsohn, Klock, Avidan and the Oral Examination Group present a study which demonstrates poor inter-rater reliability in a mock oral examination context with raters grading in true isolation. 2 Twenty-five residents were examined in a mock examination process resembling the American Board of Anesthesiology (ABA) format on two occasions six weeks apart (E1 and E2). The examinations were videotaped and scored by three experienced ABA examiners and three experienced Royal College of Physicians and Surgeons of Canada (RCPSC) examiners in isolation. The examiners were provided with a standardized scoring system and an educational package to aid with standard setting. The inter-rater reliability as determined by using intraclass correlation coefficients was poor: 0.243 (0.177– 0.305) for E1 and 0.405 (0.331–0.470) for E2. For 48% of the candidates examined, the chance of passing or failing was examiner dependent. Previous studies have demonstrated significantly better inter-rater reliability in the anesthesia oral examination process. Schubert reported inter-rater reliability as generalized reliability coefficients for both final grade received and pass-fail determination on 441 practice oral examinations given to 190 residents using the ABA format. 3 Inter-rater reliability was 0.72 for the final grade received and 0.68 for the pass-fail determination. This compares favourably with the results found by Kearney in a study using a structured oral examination format for practice examinations similar to that currently used by the RCPSC. 4 Twenty faculty examined 26 residents from two Canadian residency programs (sites A and B). Standardized questions were scored using global rating scales with anchored performance criteria. Each candidate was scored by a pair of examiners at the initial session and again subsequently from a videotaped recording. Inter-rater agreement was 0.51 for time 1 and .79 at time 2 for site A, 0.71 at time 1 and .48 at time 2 for site B. These results were classified as fair to good inter-rater reliability and the wide range of correlations found was felt to be due to several study limitations. The residents examined were at different levels of training with 25% presenting for their first practice oral examination. Evidence suggests that examiners are less consistent when rating poor performances. 5
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have