Abstract

ABSTRACT Temporally locating and classifying instruments in surgical video is useful for the analysis and comparison of surgical techniques. This paper aims to apply action segmentation techniques to temporally segment and classify surgical instruments, and to highlight the utility of this modelling approach through example applications. This paper shows that the action segmentation transformer (ASFormer) architecture with an EfficientNetV2 featurizer performs significantly better in mean average precision than any previous approaches to this task on the Cholec80 dataset. The ASFormer also outperforms Long Short-Term Memory (LSTM) and Multi-Stage Temporal Convolutional Network (MS-TCN) architectures with the same featurizer. This model reduces the need for costly human labelling of surgical video, driving the development of indexed surgical video libraries and instrument usage tracking applications. Examples of these applications are included after the results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.