Abstract

As virtual reality and metaverse become more and more popular, the Omnidirectional Image (OI) has attracted extreme attention due to its immersive display characteristics. However, users only watch a portion of the content in a specific viewport extracted from a panoramic view, which will lead to a problem of resolution mismatch that requires High-Resolution (HR) for clear near-eye displays in viewports. Hence, it is necessary to exploit a Super Resolution (SR) solution for reconstructing Low-Resolution (LR) OIs. Different from 2D SR methods, the variation of pixel distributions along latitudes is a critical factor in designing an Omnidirectional Image Super-Resolution (OISR) scheme. In this paper, we put forward a novel end-to-end network with a Transformer and Convolution Collaborative Learning Network (TCCL-Net) for OISR. Firstly, Swin Transformer blocks and residual convolution blocks are employed to extract long-range and short-range dependencies, thereby digging into more rich and heterogeneous features from these two branches. Secondly, to better fuse these two features, cross-guided enhanced attention mechanisms are designed for bidirectional information enhancement onto both channel and spatial features. Thirdly, to alleviate nonuniformly pixel distributions across latitudes, we add an absolute positional encoding into Swin Transformer to represent patch weights at different positions and propose a tile-based panoramic reconstruction module to super-resolve various bands with different pixel sampling characteristics across latitudes. Experimental results on two available benchmark datasets demonstrate the superiority of the proposed approach over the state-of-the-art method in achieving OISR task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call