Efficient mixed transformer for single image super-resolution

Ling Zheng,Jinchen Zhu,Jinpeng Shi,Shizhuang Weng

doi:10.1016/j.engappai.2024.108035

Abstract

Recently, transformer-based methods have achieved impressive results in single image super-resolution (SISR). However, the lack of locality mechanism and high complexity limit their application. To solve these problems, we propose a new method, Efficient Mixed Transformer (EMT), in this study. Specifically, we propose the Mixed Transformer Block (MTB), consisting of multiple consecutive transformer layers, in some of which the Pixel Mixer (PM) is used to replace the Self-Attention (SA). PM can enhance the local knowledge aggregation with pixel mismatch operations, and no additional complexity is introduced as PM has no parameters and floating-point operations. Moreover, we develop striped window SA to gain an efficient global dependency modeling by utilizing image anisotropy. Experimental results show that EMT outperforms the existing methods on benchmark dataset and achieved state-of-the-art performance.

Full Text