With stereo cameras becoming widely used in invasive surgery systems, stereo endoscopic images provide important depth information for delicate surgical tasks. However, the small size of sensors and their limited lighting conditions lead to low-quality and low-resolution endoscopic images and videos. In this paper, we propose a stereo endoscopic video super-resolution method using transformer with a hybrid attention mechanism named HA-VSR. Stereo video SR aims to reconstruct high-resolution (HR) images from corresponding low-resolution (LR) videos. In our method, the stereo correspondence and temporal correspondence are incorporated into the HA-VSR model. Specifically, the Swin transformer architecture is utilized in proposed framework with hybrid attention mechanisms. The parallel attention mechanism is utilized by using the symmetry and consistency of left and right images, and the temporal attention mechanism is utilized by using the consistency of consecutive frames. Detailed quantitative evaluation and experiments on two datasets show the proposed model achieves advanced SR reconstruction performance, showing that the proposed stereo VSR framework outperforms alternative approaches.
Read full abstract