Abstract
Itti and Koch’s Saliency Model has been used extensively to simulate fixation selection in a variety of tasks from visual search to simple reaction times. Although the Saliency Model has been tested for its spatial prediction of fixations in visual salience, it has not been well tested for their temporal accuracy. Visual tasks, like search, invariably result in a positively skewed distribution of saccadic reaction times over large numbers of samples, yet we show that the leaky integrate and fire (LIF) neuronal model included in the classic implementation of the model tends to produce a distribution shifted to shorter fixations (in comparison with human data). Further, while parameter optimization using a genetic algorithm and Nelder–Mead method does improve the fit of the resulting distribution, it is still unable to match temporal distributions of human responses in a visual task. Analysis of times for individual images reveal that the LIF algorithm produces initial fixation durations that are fixed instead of a sample from a distribution (as in the human case). Only by aggregating responses over many input images do they result in a distribution, although the form of this distribution still depends on the input images used to create it and not on internal model variability.
Highlights
Despite limits to the processing capacity of the human visual system, we are quick to make sensible interpretations of incoming visual information
Dominant models of bottom-up attention in tasks like inspection and visual search rely on the idea that visual saliency influences where we attend; in other words, they assume that properties of a visual stimulus stand out against properties of other environmental stimuli and capture our attention [3,4,5,6]
This concept is based on the feature integration theory (FIT) of attention [7], which states that, at an early ‘pre-attentive’ processing stage, features are registered in parallel across the whole visual field
Summary
Despite limits to the processing capacity of the human visual system, we are quick to make sensible interpretations of incoming visual information. Dominant models of bottom-up attention in tasks like inspection and visual search rely on the idea that visual saliency influences where we attend; in other words, they assume that properties of a visual stimulus stand out against properties of other environmental stimuli and capture our attention [3,4,5,6] This concept is based on the feature integration theory (FIT) of attention [7], which states that, at an early ‘pre-attentive’ processing stage, features are registered in parallel across the whole visual field. Brain Sci. 2020, 10, 16 and encoded along a number of perceptual dimensions (orientation, color, spatial frequency, brightness, etc.), and, at a later ‘attentive’ stage, they are combined to a perceived object with help of attention [4,7] This concept could be implemented on a level of computational model. The final layer of these models of fixation selection are typically implemented with a winner-take-all (WTA) network of neurons using a leaky integrate and fire (LIF) model, which is a neuronal activation model able to predict neuronal spikes [8]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.