Abstract

Video super-resolution aims to reconstruct a high-resolution video from a low-resolution video corresponding to a magnification scale. Video super-resolution, as a fundamental computer vision task, is widely used in various fields. Particularly, in the field of endoscopic, high-resolution endoscopic videos help doctors to observe more details of lesions and improve the accuracy and speed of diagnosis. A novel deformable Transformer network is proposed to solve the super-resolution problem of endoscopic video data. To address the problem that the Transformer’s self-attention module cannot effectively capture local information, the self-attention module is improved by using convolution operations to increase the local feature capture capability of the self-attention module. In order to compensate for the deficiency of Transformer for continuous inter-frame alignment, a new bidirectional deformable convolutional network is designed as the feed-forward module of Transformer to achieve frame-to-frame feature alignment and feature propagation using deformable convolution. A high-resolution dataset for endoscopic video super-resolution is produced using endoscopic surgery videos. Our proposed deformable Transformer network is demonstrated to have the best performance with the competitive number of parameters in endoscopic imaging so far by comparing the performance of other existing video super-resolution methods in the endoscopic dataset through sufficient experiments. Our proposed deformable Transformer network improves the PSNR metric by 0.97 dB over the state-of-the-art method in the RGB channel, while reducing the number of network parameters by 0.39 million.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call