Abstract
This paper presents a deep learning approach to recognize and predict surgical activity in robot-assisted minimally invasive surgery (RAMIS). Our primary objective is to deploy the developed model for implementing a real-time surgical risk monitoring system within the realm of RAMIS. We propose a modified Transformer model with the architecture comprising no positional encoding, 5 fully connected layers, 1 encoder, and 3 decoders. This model is specifically designed to address 3 primary tasks in surgical robotics: gesture recognition, prediction, and end-effector trajectory prediction. Notably, it operates solely on kinematic data obtained from the joints of robotic arm. The model's performance was evaluated on JHU-ISI Gesture and Skill Assessment Working Set dataset, achieving highest accuracy of 94.4% for gesture recognition, 84.82% for gesture prediction, and significantly low distance error of 1.34mm with a prediction of 1s in advance. Notably, the computational time per iteration was minimal recorded at only 4.2ms. The results demonstrated the excellence of our proposed model compared to previous studies highlighting its potential for integration in real-time systems. We firmly believe that our model could significantly elevate realms of surgical activity recognition and prediction within RAS and make a substantial and meaningful contribution to the healthcare sector.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have