Abstract

Understanding visual attention of observers on omni-directional images gains interest along with the booming trend of virtual reality applications. In this paper, we propose a novel attentive and context-aware network for saliency prediction on omni-directional images, which is named as ACSalNet. In this architecture, considering the problem of insufficient receptive fields of high-level features, a Deformable Attention Bottleneck (DAB) is first proposed to strengthen the high-level feature extractor and effectively focus the limited receptive field of the model to the key areas. Then, to reduce the semantic gap between features of different levels and introduce context-aware information, we further design a Context-aware Feature Pyramid Module (CFPM). In the testing phase, in order to reduce the error of prediction directly on the equirectangular images while retaining their integrity, a novel projection method called Multiple Sphere Rotation (MSR) is proposed. Extensive experiments illustrate that the proposed method outperforms the state-of-the-art models under different evaluation metrics on the public saliency benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call