A Comparison of Bottom-Up Models for Spatial Saliency Predictions in Autonomous Driving.

Jaime Maldonado,Lino Antoni Giefer

doi:10.3390/s21206825

Abstract

Bottom-up saliency models identify the salient regions of an image based on features such as color, intensity and orientation. These models are typically used as predictors of human visual behavior and for computer vision tasks. In this paper, we conduct a systematic evaluation of the saliency maps computed with four selected bottom-up models on images of urban and highway traffic scenes. Saliency both over whole images and on object level is investigated and elaborated in terms of the energy and the entropy of the saliency maps. We identify significant differences with respect to the amount, size and shape-complexity of the salient areas computed by different models. Based on these findings, we analyze the likelihood that object instances fall within the salient areas of an image and investigate the agreement between the segments of traffic participants and the saliency maps of the different models. The overall and object-level analysis provides insights on the distinctive features of salient areas identified by different models, which can be used as selection criteria for prospective applications in autonomous driving such as object detection and tracking.

Highlights

Visual attention is the mechanism by which human beings can selectively process salient stimuli
A post-hoc Tukey test [25] revealed that all the pairwise differences are significant (p < 0.001). These results indicate that the number of salient regions and their shape complexity differ across models
The entropy values revealed that saliency produced by the Spectral Residual (SR) model is distributed over several areas, the saliency produced by the Graph-Based Visual Saliency (GBVS) model tends to be concentrated in a single zone

Summary

Introduction

Visual attention is the mechanism by which human beings can selectively process salient stimuli. Top-down refers to the cognitive factors of the observer which determine whether an object or a region of the visual field is salient. Bottom-up factors have been extensively studied in the literature and many computational models to identify salient regions have been proposed [1]. Depending on the computational mechanisms involved, as well as the features or cues used to detect saliency, different bottom-up models identify salient areas of an image differently (see Section 2). An area identified as salient by one model can be regarded as non-salient by another model These differences are illustrated, showing the saliency maps of an image generated by different bottom-up models

Objectives

Methods

Results

Conclusion