Abstract

We present a review of methods for automatic estimation of visual saliency: the perceptual property that makes specific elements in a scene stand out and grab the attention of the viewer. We focus on domains that are especially recent and relevant, as they make saliency estimation particularly useful and/or effective: omnidirectional images, image groups for co-saliency, and video sequences. For each domain, we perform a selection of recent methods, we highlight their commonalities and differences, and describe their unique approaches. We also report and analyze the datasets involved in the development of such methods, in order to reveal additional peculiarities of each domain, such as the representation used for the ground truth saliency information (scanpaths, saliency maps, or salient object regions). We define domain-specific evaluation measures, and provide quantitative comparisons on the basis of common datasets and evaluation criteria, highlighting the different impact of existing approaches on each domain. We conclude by synthesizing the emerging directions for research in the specialized literature, which include novel representations for omnidirectional images, inter- and intra- image saliency decomposition for co-saliency, and saliency shift for video saliency estimation.

Highlights

  • Visual saliency is defined as a property of a scene in relation to an observer

  • We provide a selection of the evaluation measures most commonly used by the saliency estimation methods analyzed in this paper

  • Methods for saliency estimation in omnidirectional images are evaluated with a variety of measures, most of which are common to visual saliency in traditional images, such as the Pearson correlation coefficient CC ([29,33,36,61,62,63,64]), and the area under the Receiver Operating Characteristic (ROC) curve Area Under Curve (AUC) ([34,36,61,62,63,64])

Read more

Summary

Introduction

Visual saliency is defined as a property of a scene in relation to an observer. This follows from a commonly-accepted interpretation [1,2,3] that defines it as the set of subjective and perceptual attributes that make certain items stand out from their surroundings, and grab the viewer’s attention.In the vision system of human beings and other animals, two components typically contribute to the overall saliency: bottom-up and top-down factors [4]. Visual saliency is defined as a property of a scene in relation to an observer. This follows from a commonly-accepted interpretation [1,2,3] that defines it as the set of subjective and perceptual attributes that make certain items stand out from their surroundings, and grab the viewer’s attention. Bottom-up saliency is driven by low-level activations in the vision system, based for example on pre-attentive computational mechanisms in the primary visual cortex [5], and does not depend on specific tasks and objectives. Top-down saliency is defined as being goal-directed [6], and as such it is highly dependent on the intrinsic biases of the observer, and correlated to the semantics of the depicted elements. Scientific literature reviews for automatic visual saliency estimation often adopt these two categories to classify existing methods [2]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.