Abstract
Surgical skill assessment currently hinges on the manual observations of senior surgeons, and the assessment process is inherently time-consuming and subjective. Hence, there is a need to develop machine learning-based automated robotic surgical skill assessment. However, the existing machine learning-based works are only built in either the time domain or frequency domain but have never considered the investigation on the time–frequency domain. To fill the research gap, we explore the representation of the surgery motion data in the time–frequency domain. In this study, we propose a novel automated robotic surgical skill assessment framework called Continuous Wavelet Transform-Vision Transformer (CWT-ViT). We apply continuous wavelet transform, i.e., a time–frequency representation method, to convert robotic surgery kinematic data to synthesis images. Furthermore, by taking advantage of the prior knowledge of the da Vinci surgical system, we design a four branches-based architecture, each branch representing a robotic manipulator. We have conducted extensive experiments and achieved comparable results on the benchmark robotic surgical skill dataset JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our proposed CWT-ViT framework has demonstrated the feasibility of applying time–frequency representation for automated robotic surgical skill assessment using kinematic data. The code is available at https://github.com/yiming95/CWT-ViT-Surgery.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.