Abstract

The semantic analysis of nasal endoscopic video is a challenging task since lots of irrelevant and insignificant information exists in the untrimmed surgical video, i.e. background, blur, judder or blood-stained video fragments. It is important to identify the start and end point of the valid surgical fragments automatically and remove the invalid fragments of endoscopic surgery videos for medical education & research. However, the performance of deep-learning based methods, which use a fixed time interval and a sliding window, are severely affected when the interference information appears randomly in the nasal endoscopic video. Specifically, the surgical video is a continuous process globally, while many local discontinuity fragments are brought when endoscope enters and exits the cavity frequently. Hence, we propose a multi-granularity semantic analysis framework that can simultaneously meet the accuracy and timeliness required for endoscopic surgery video semantic analysis. Our approach is an end-to-end solution. First, a joint model is created to extract the temporal-spatial features of the surgical video on a coarse-grained scale. Meanwhile, an attention mechanism is used to automatically select the informative spatial features of endoscopic video. Second, a hierarchical self-correction module is proposed to correct the boundaries of the surgical operation iteratively on a fine-grained scale. Finally, we justify the proposed network through extensive experiments and quantitative comparisons against other state-of-the-art approaches. We achieve a good performance in terms of accuracy and efficiency.

Highlights

  • Endoscopic surgery has been more and more practiced in nasal surgery in recent years because of its less trauma and quick recover [1]–[3], the number of nasal surgery videos was continuously booming

  • A complete endoscopic surgical video is recorded from the beginning of the operation to the end of the operation

  • Continuous surgical operations are interrupted by these invalid shots in the endoscopic surgery video

Read more

Summary

Introduction

Endoscopic surgery has been more and more practiced in nasal surgery in recent years because of its less trauma and quick recover [1]–[3], the number of nasal surgery videos was continuously booming. This work provides the first semantic analysis for nasal endoscopic surgery video using deep learning method. Semantic analysis of endoscopic surgery video with multi-granular spatial-temporal features combined with modeling scheme.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call