Binomial entropy of anesthesiologists’ ratings of nurse anesthetists’ clinical performance explains information loss when adjusting evaluations for rater leniency

Franklin Dexter,Richard H Epstein,John Öhrvik,Bradley J Hindman

doi:10.1016/j.pcorm.2022.100247

Abstract

BackgroundManagers and clinical directors have the ethical and administrative responsibility, and professional duty, to identify clinicians who are performing significantly better or worse than their peers. At the University of Iowa, faculty anesthesiologists (raters) working with Certified Registered Nurse Anesthetists (ratees) evaluate their clinical performance daily using a valid and reliable work habits scale. Because the evaluations are used for ongoing professional practice evaluation and performance reviews, rater bias should be reduced. However, adjustment for rater bias (leniency) causes many raters’ evaluations to have little influence on ranking of ratees. MethodsThe retrospective cohort study was from a large teaching hospital. Functionally, ratings are binary: the value is 1 when all six items in the scale are scored at the maximum performance, and 0, otherwise (i.e., any item less than maximum). The 40,027 ratings over 5.8 years were sorted by rater in descending sequence of date. The Shannon information content of the ratings was quantified using binomial entropy. The most recent 6359 ratings from 2020 were analyzed using mixed effects logistic regression, with each rater as a fixed effect and ratees as random effects. ResultsUsing all 74 raters, the Spearman correlation coefficient between the precisions of the raters’ coefficients in the logistic regression and the corresponding binomial entropy was 0.997 (99% confidence interval [CI] 0.992 to 0.999). Excluding the 33 raters who provided all ratings being 1’s or 0’s, the remaining 41 raters’ Spearman correlation was 0.985 (99% confidence interval 0.965 to 0.997). Those 41 raters had median binomial entropies that were 76% of the normalized maximum entropy (99% CI 62% to 86%), while the other 33 raters’ entropy was 0%. When the same rater gave the same rating >10 times in a row, there was a median of 23 more identical ratings in the run (99% CI 21 to 26, N=535). ConclusionsMost loss of information originates from raters who provide all ratees with the largest possible score for all items and from raters who never provide ratings with the maximum score. There should be educative feedback to raters who consistently rate all ratees the same, both for departments with quantitative evaluations using a reliable scale and departments using qualitative evaluations.

Full Text