Abstract

Blind Image Quality Assessment (BIQA) is a challenging, unsolved research topic which is crucial for analyzing, understanding, and improving visual experience. Recently, transformer-based BIQA models are drawing increasing attention due to their powerful capacity in modeling global dependencies amongst tokens. However, existing works tend to apply self-attention mechanism for exploring the spatial dependencies whilst neglecting the impact of channel-wise self-attention. In this paper, we explore the feasibility of incorporating attention mechanism in a channel-wise manner for BIQA. By systematically studying the interactions between channel-wise and spatial-wise attention, an adaptive spatial and channel attention merging Transformer (ASCAM-Former) is then proposed for aggregating both the spatial-wise and channel-wise attention information. In addition, to accommodate IQA datasets containing both image and patch quality labels, an image to patch weights sharing (I2PWS) scheme is designed to take advantage of local quality learning tasks for reinforcing the learning of global quality, and vice versa. The experimental results indicate that channel-wise attention mechanism is as competitive as spatial-wise for IQA tasks, and the proposed ASCAM-Former yield accurate prediction on both authentically and synthetically distorted image quality datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call