Abstract

Lightweight Single Image Super Resolution has seen many advances in recent times. Transformer-based methods have achieved great improvements over CNN-based methods. This is mainly driven by the transformer's ability to effectively model long-range dependencies and retain textures in images. However, these transformer-based approaches have many parameters and are computationally expensive during inference. In this work, we propose SWIFT, a hybrid of transformers and Fast Fourier Convolutions (FFC) for Lightweight Single Image Super Resolution. We designed a novel Dual Spectrum Frequency Block (DSFB) that processes features in both the spatial domain and the Fourier domain. DSFB allows us to effectively maintain global context in features and extract high-frequency information. Additionally, to mitigate the frequency erasing nature of transformers, we introduce SwinV2+ transformers that use attention scaling to promote high-frequency information. Experimental results on popular benchmarking datasets show that SWIFT outperforms state-of-the-art transformer-based methods in the realm of lightweight SISR, using 34% fewer parameters and being up to 60% faster during inference.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call