You have accessJournal of UrologySurgical Technology & Simulation: Training & Skills Assessment I1 Apr 2018MP01-05 HOW ACCURATE IS THE CROWD: DETERMINING THE DISCRIMINATORY CAPABILITY OF EXPERT AND CROWDSOURCED GEARS EVALUATIONS FOR THE ROBOTIC ASSISTED RADICAL PROSTATECTOMY Paul Oh, Jian Chen, Anthony Jarc, Micha Titus, and Andrew Hung Paul OhPaul Oh More articles by this author , Jian ChenJian Chen More articles by this author , Anthony JarcAnthony Jarc More articles by this author , Micha TitusMicha Titus More articles by this author , and Andrew HungAndrew Hung More articles by this author View All Author Informationhttps://doi.org/10.1016/j.juro.2018.02.111AboutPDF ToolsAdd to favoritesDownload CitationsTrack CitationsPermissionsReprints ShareFacebookTwitterLinked InEmail INTRODUCTION AND OBJECTIVES Evaluation of surgical video by experts to assess robotic performance is expensive and time-consuming compared to crowdsourcing from the general population. This study evaluates experts′ and the Crowd′s discriminatory ability in differentiating with Global Evaluative Assessment of Robotic Skills (GEARS) scores, one′s clinical outcomes and automated performance metrics. METHODS Automated performance metrics (instrument motion tracking and system events) of 25 anterior vesico-urethral anastomoses of robotic-assisted radical prostatectomies were captured by the dVLogger (Intuitive Surgical). Video footage was graded with GEARS by 4 experts (averaged) and the Crowd (C-SATS) composed of 807 crowdworkers. GEARS scores (individual domains and total) were stratified into quartiles (Top 25% = Q1, Middle 50% = Q2, Bottom 25% = Q3), and each quartile′s outcomes and metrics were compared using ANOVA with post-hoc Tukey analysis to determine the discriminatory strength of GEARS. RESULTS Individual GEARS domains: The Crowd′s strongest discriminatory ability was in Efficiency (5 differences in automated performance metrics, p<0.04) and Bimanual Dexterity (BD; 4 metrics, p<0.05), while it was the least effective in Robotic Control (RC; 1 metric, p=0.04) and Depth Perception (DP; 0 metrics). Averaged Expert scores had similar discriminatory abilities in Efficiency and BD, but they also determined differences in RC (1 outcome, 2 metrics; p<0.03) and DP (1 outcome, 1 metric; p<0.04). Estimated blood loss (EBL) was the one outcome differentiated by Averaged Experts in Efficiency, RC, and DP (p<0.05). Total GEARS (Figure): The Crowd′s Total GEARS quartiles detected differences in 4 metrics (p<0.043) compared to 1 metric (p=0.013) by the Averaged Experts. The Crowd and Averaged Experts consistently detected differences in velocity of dominant arm in all GEARS domains. CONCLUSIONS The Crowd′s individual GEARS scores can determine differences in surgical maneuvers, but not to the extent of experts, who can also detect differences in EBL. However, the Crowd is more discriminatory than experts using Total GEARS. Considering the feasibility of assessing numerous surgeries, the Crowd maintains a strong role as an evaluator when using Total GEARS scores. © 2018FiguresReferencesRelatedDetails Volume 199Issue 4SApril 2018Page: e3 Advertisement Copyright & Permissions© 2018MetricsAuthor Information Paul Oh More articles by this author Jian Chen More articles by this author Anthony Jarc More articles by this author Micha Titus More articles by this author Andrew Hung More articles by this author Expand All Advertisement Advertisement PDF downloadLoading ...
Read full abstract