Performance Improvement (Pi) score: an algorithm to score Pi objectively during E-BLUS hands-on training sessions. A European Association of Urology, Section of Uro-Technology (ESUT) project.

Domenico Veneziano,Chandra S Biyani,J.F Langenhuijsen,Christian Wagner,Antonio Canova,Andreas Skolarikos,John D Beatty,Bhaskar Somani,Giles O Hellawell,Giovannalberto Pini,Oscar Rodriguez Faba,Federico Dehò,Giovanni Tripepi,Cristian Fiori,Theodoros Tokas,Giampaolo Siena,Michiel Arnolds,Ben S.E.P Van Cleynenbreugel,Estevao Lima

doi:10.1111/bju.14621

Abstract

To evaluate the variability of subjective tutor performance improvement (Pi) assessment and to compare it with a novel measurement algorithm: the Pi score. The Pi-score algorithm considers time measurement and number of errors from two different repetitions (first and fifth) of the same training task and compares them to the relative task goals, to produce an objective score. We collected data during eight courses on the four European Association of Urology training in Basic Laparoscopic Urological Skills (E-BLUS) tasks. The same tutor instructed on all courses. Collected data were independently analysed by 14 hands-on training experts for Pi assessment. Their subjective Pi assessments were compared for inter-rater reliability. The average per-participant subjective scores from all 14 proctors were then compared with the objective Pi-score algorithm results. Cohen's κ statistic was used for comparison analysis. A total of 50 participants were enrolled. Concordance found between the 14 proctors' scores was the following: Task 1, κ = 0.42 (moderate); Task 2, κ = 0.27 (fair); Task 3, κ = 0.32 (fair); and Task 4, κ = 0.55 (moderate). Concordance between Pi-score results and proctor average scores per participant was the following: Task 1, κ = 0.85 (almost perfect); Task 2, κ = 0.46 (moderate); Task 3, κ = 0.92 (almost perfect); Task 4 = 0.65 (substantial). The present study shows that evaluation of Pi is highly variable, even when formulated by a cohort of experts. Our algorithm successfully provided an objective score that was equal to the average Pi assessment of a cohort of experts, in relation to a small amount of training attempts.

Full Text