Assessment is key in modern surgical education to monitor progress and document sufficient skills. Virtual reality (VR) temporal bone simulators allow automated tracking of basic metrics such as time, volume removed, and collisions. However, adequate performance assessment further includes compound rating of the stepwise bony excavation, and exposure and preservation of soft tissue structures. Such complex assessment requires further development of automated assessment routines in the VR simulation environment. In this study, we present the integration of automated mastoidectomy final-product assessment with validation against manual rating. At two international temporal bone courses, 33 ORL trainees performed anatomical mastoidectomies in the Visible Ear (VR) Simulator with automatic performance assessment using a newly implemented rating routine based on the modified Welling Scale. Automated assessment was compared with manual ratings by experts using absolute agreement, intraclass correlation, and generalizability analysis to establish validity and reliability. The overall average agreement between manual and automatic assessment was 83.9% compared with the inter-rater agreement of 88.9%. A majority of items (15 out of 26) showed high agreement between automated and manual rating (>85%). Intraclass correlation coefficients were found to be high. Generalizability analysis with D-studies found that five repetitions per participant are needed for a G coefficient >0.8, which is considered necessary for high-stakes assessments. We have demonstrated the feasibility, validity, and reliability of an automatic assessment system integrated into a VR temporal bone simulator. This can prove to be an important tool for future self-directed training with skills certification.
Read full abstract