Study ObjectiveTo answer the question of whether there is a difference between robotic virtual reality simulator performance assessment and validated human reviewers. Current surgical education relies heavily on simulation. Several assessment tools are available to the trainee, including the actual robotic simulator assessment metrics and the Global Evaluative Assessment of Robotic Skills (GEARS) metrics, both of which have been independently validated. GEARS is a rating scale through which human evaluators can score trainees' performances on 6 domains: depth perception, bimanual dexterity, efficiency, force sensitivity, autonomy, and robotic control. Each domain is scored on a 5-point Likert scale with anchors. We used 2 common robotic simulators, the dV-Trainer (dVT; Mimic Technologies Inc., Seattle, WA) and the da Vinci Skills Simulator (dVSS; Intuitive Surgical, Sunnyvale, CA), to compare the performance metrics of robotic surgical simulators with the GEARS for a basic robotic task on each simulator. DesignA prospective single-blinded randomized study. SettingA surgical education and training center. ParticipantsSurgeons and surgeons in training. InterventionsDemographic information was collected including sex, age, level of training, specialty, and previous surgical and simulator experience. Subjects performed 2 trials of ring and rail 1 (RR1) on each of the 2 simulators (dVSS and dVT) after undergoing randomization and warm-up exercises. The second RR1 trial simulator performance was recorded, and the deidentified videos were sent to human reviewers using GEARS. Eight different simulator assessment metrics were identified and paired with a similar performance metric in the GEARS tool. The GEARS evaluation scores and simulator assessment scores were paired and a Spearman rho calculated for their level of correlation. Measurements and Main ResultsSeventy-four subjects were enrolled in this randomized study with 9 subjects excluded for missing or incomplete data. There was a strong correlation between the GEARS score and the simulator metric score for time to complete versus efficiency, time to complete versus total score, economy of motion versus depth perception, and overall score versus total score with rho coefficients greater than or equal to 0.70; these were significant (p < .0001). Those with weak correlation (rho ≥0.30) were bimanual dexterity versus economy of motion, efficiency versus master workspace range, bimanual dexterity versus master workspace range, and robotic control versus instrument collisions. ConclusionOn basic VR tasks, several simulator metrics are well matched with GEARS scores assigned by human reviewers, but others are not. Identifying these matches/mismatches can improve the training and assessment process when using robotic surgical simulators.