Abstract

ABSTRACT Temporally locating and classifying instruments in surgical video is useful for the analysis and comparison of surgical techniques. This paper aims to apply action segmentation techniques to temporally segment and classify surgical instruments, and to highlight the utility of this modelling approach through example applications. This paper shows that the action segmentation transformer (ASFormer) architecture with an EfficientNetV2 featurizer performs significantly better in mean average precision than any previous approaches to this task on the Cholec80 dataset. The ASFormer also outperforms Long Short-Term Memory (LSTM) and Multi-Stage Temporal Convolutional Network (MS-TCN) architectures with the same featurizer. This model reduces the need for costly human labelling of surgical video, driving the development of indexed surgical video libraries and instrument usage tracking applications. Examples of these applications are included after the results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call