Early Visual Saliency Based on Isolated Optimal Features.

Serena Castellotti,Maria Michela Del Viva,Anna Montagnini

doi:10.3389/fnins.2021.645743

Abstract

Under fast viewing conditions, the visual system extracts salient and simplified representations of complex visual scenes. Saccadic eye movements optimize such visual analysis through the dynamic sampling of the most informative and salient regions in the scene. However, a general definition of saliency, as well as its role for natural active vision, is still a matter for discussion. Following the general idea that visual saliency may be based on the amount of local information, a recent constrained maximum-entropy model of early vision, applied to natural images, extracts a set of local optimal information-carriers, as candidate salient features. These optimal features proved to be more informative than others in fast vision, when embedded in simplified sketches of natural images. In the present study, for the first time, these features were presented in isolation, to investigate whether they can be visually more salient than other non-optimal features, even in the absence of any meaningful global arrangement (contour, line, etc.). In four psychophysics experiments, fast discriminability of a compound of optimal features (target) in comparison with a similar compound of non-optimal features (distractor) was measured as a function of their number and contrast. Results showed that the saliency predictions from the constrained maximum-entropy model are well verified in the data, even when the optimal features are presented in smaller numbers or at lower contrast. In the eye movements experiment, the target and the distractor compounds were presented in the periphery at different angles. Participants were asked to perform a simple choice-saccade task. Results showed that saccades can select informative optimal features spatially interleaved with non-optimal features even at the shortest latencies. Saccades’ choice accuracy and landing position precision improved with SNR. In conclusion, the optimal features predicted by the reference model, turn out to be more salient than others, despite the lack of any clues coming from a global meaningful structure, suggesting that they get preferential treatment during fast image analysis. Also, peripheral fast visual processing of these informative local features is able to guide gaze orientation. We speculate that active vision is efficiently adapted to maximize information in natural visual scenes.

Highlights

The visual system needs to analyze the visual scene efficiently in a short time—in the order of 10 ms, as fast image recognition is crucial for survival (Hare, 1973)
Apparatus and Set-Up All stimuli were programmed on an ACER computer running Windows 7 with Matlab 2016b, using the Psychophysics Toolbox extensions (Brainard, 1997; Pelli, 1997; Kleiner et al, 2007), and displayed on a gamma-corrected CRT Silicon Graphics monitor with 1,280 × 960 pixels resolution at 120 Hz refresh rate
Both psychophysical and eye movements data confirm the results of Experiment 2, with a smaller set of data and at a slightly larger eccentricity (5◦ instead of 3◦)

Summary

Introduction

The visual system needs to analyze the visual scene efficiently in a short time—in the order of 10 ms, as fast image recognition is crucial for survival (Hare, 1973). A considerable amount of energy is required to create an accurate representation of the visual scene in the shortest possible time (Attwell and Laughlin, 2001; Lennie, 2003; Echeverri, 2006). For this reason, the visual system is likely to operate a strong data reduction at an early stage of processing (Attneave, 1954; Barlow, 1961; Olshausen and Field, 1996), by creating a compact summary of the relevant features (Marr, 1982; Morgan, 2011). The saliency related to each individual visual property of a single stimulus is typically combined into a global percept of stimulus saliency and different stimuli, defined by different conspicuous properties (e.g., a red square among green square and a tilted line among horizontal lines) can be compared and eventually empirically matched in terms of saliency (Nothdurft, 2000)

Methods

Results

Conclusion