Abstract

Salient object detection or salient region detection models, diverging from fixation prediction models, have traditionally been dealing with locating and segmenting the most salient object or region in a scene. While the notion of most salient object is sensible when multiple objects exist in a scene, current datasets for evaluation of saliency detection approaches often have scenes with only one single object. We introduce three main contributions in this paper. First, we take an in-depth look at the problem of salient object detection by studying the relationship between where people look in scenes and what they choose as the most salient object when they are explicitly asked. Based on the agreement between fixations and saliency judgments, we then suggest that the most salient object is the one that attracts the highest fraction of fixations. Second, we provide two new less biased benchmark data sets containing scenes with multiple objects that challenge existing saliency models. Indeed, we observed a severe drop in performance of eight state-of-the-art models on our data sets (40%-70%). Third, we propose a very simple yet powerful model based on superpixels to be used as a baseline for model evaluation and comparison. While on par with the best models on MSRA-5 K data set, our model wins over other models on our data highlighting a serious drawback of existing models, which is convoluting the processes of locating the most salient object and its segmentation. We also provide a review and statistical analysis of some labeled scene data sets that can be used for evaluating salient object detection models. We believe that our work can greatly help remedy the over-fitting of models to existing biased data sets and opens new venues for future research in this fast-evolving field.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call