Abstract

We present a model for saliency prediction in 360º videos based on the Transformer architecture. Our model leverages the global attention mechanism in order to represent the temporal dependencies that drive human attention. We compare our model with a current state-of-the-art model and outperform it for all metrics measured.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call