Abstract

Current saliency prediction models based on convolutional neural networks (CNNs) achieve solid improvement in predicting human attention on omnidirectional image (ODI). However, these models that employ standard convolution have two main shortcomings: content-agnostic and computation-intensive. To address these two shortcomings, we propose a decoupled dynamic group equivariant filter (DDGF). Specifically, inspired by the attention mechanism that adopts light-weight branches for estimating spatial and channel attention, we decouple group equivariant convolution (i.e. p4 convolution) into spatial and channel dynamic group equivariant filters. Such a design not only makes p4 convolution filter adaptive to ODI content, but also considerably reduces computational cost. To our best knowledge, the DDGF is the first decoupled dynamic convolution filter that applied to the task of saliency prediction. Meanwhile, we observe that it is effective and efficient when replacing standard group equivariant convolution with DDGF in ODI saliency prediction. Experimental results show that the proposed DDGF can achieve superior performance in comparison with other state-of-the-art methods. Additionally, we conduct ablation experiments to verify the effectiveness of each component of the proposed DDGF.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call