Abstract
The paper presents a deep-learning model that may be used to calculate saliency maps for video content. Classic saliency algorithms take into account only spatial information obtained from an image to generate a saliency map showing regions of higher importance—hence, regions where people are more likely to turn their gaze and attention. Algorithms for video stimuli add temporal information about the movements of objects on a frame-to-frame basis, resulting in more complex three-dimensional architectures. The paper analyses existing models and proposes a model based on one of them. The model’s performance is compared with the literature using four widely accepted measures (AUC, NSS, SIM, and CC). It is comparable and, in many cases, even better than already published models. Additionally, because of some improvements in the architecture, it is significantly faster in terms of the number of frames processed per second.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.