Abstract

At present, most high-performing saliency prediction models for omnidirectional images (ODIs) depend on deeper or wider convolutional neural networks (CNNs), benefiting from their superior feature representation capability but suffering from high computational costs. To address this issue, we propose a novel lightweight saliency prediction model to predict the eye fixations on ODIs. Specifically, our proposed model consists of three modules: a lightweight feature representation module, a supervised attention module, and a dynamic convolution aggregation module. Different from the existing saliency prediction models, our proposed model is the first to introduce the dynamic convolution into the saliency prediction and aggregate multiple parallel convolution kernels dynamically based on their attention. Such a dynamic convolution operation is not only computationally efficient (small kernel size), but also increases the feature representation capability since these convolution kernels are aggregated in a non-linear manner via attention. Experimental results on two benchmark datasets show that our model is lightweight and outperforms other state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call