Virtual Reality vs Dry Laboratory Models: Comparing Automated Performance Metrics and Cognitive Workload During Robotic Simulation Training.

Andrew Cowan,Samuel Mingo,Sharath S Reddy,Andrew J Hung,Jian Chen,Jessica H Nguyen,Runzhuo Ma,Sandra Marshall

doi:10.1089/end.2020.1037

Abstract

Background: This study compares surgical performance during analogous vesico-urethral anastomosis (VUA) tasks in two robotic training environments, virtual reality (VR) and dry laboratory (DL), to investigate transferability of skill assessment across the two platforms. Utilizing computer-generated performance metrics and pupillary data, we evaluated the two environments to distinguish surgical expertise and ultimately whether performance in the VR simulation correlates with performance in live robotic surgery in the DL. Materials and Methods: Experts (≥300 cases) and trainees (<300 cases) performed analogous VUAs during VR and DL sessions on a da Vinci robotic console following an Institutional Review Board (IRB) approved protocol (HS-16-00318). Twenty-two metrics were generated in each environment (kinematic metrics, tissue metrics, and biometrics). The DL included 18 previously validated automated performance metrics (APMs) (kinematics and event metrics) captured by an Intuitive system data recorder. In both settings, Tobii Pro Glasses 2 recorded the task-evoked pupillary response (reported as Index of Cognitive Activity [ICA]) to indicate cognitive workload, analyzed by EyeTracking cognitive workload software. Pearson correlation, Mann-Whitney, and independent t-tests were used for the comparative analyses. Results: Our study included six experts (median caseload 1300 [interquartile range 400-3000]) and 11 trainees (25 [0-250]). A total of 8/9 metrics directly comparable between VR and DL showed significant positive correlation (r ≥ 0.554, p ≤ 0.032); 5/22 VR metrics distinguished expertise, including task time (p = 0.031), clutch usage (p = 0.040), unnecessary needle piercing (p = 0.026), and suspected injury to the endopelvic fascia (p = 0.040). This contrasts with 14/22 APMs in DL (p ≤ 0.038), including linear velocities of all three instruments (p ≤ 0.038) and dominant-hand instrument wrist articulation (p = 0.013). Trainees experienced higher cognitive workload (ICA) in both environments when compared with experts (p < 0.036). Conclusions: Most performance metrics between VR and DL exhibited moderate to strong correlations, showing transferability of skills across the platforms. Comparing training environments, APMs during DL tasks are better able to distinguish expertise than VR-generated metrics.

Full Text