To what extent can a computational model of the bottom–up visual attention predict what an observer is looking at? What is the contribution of the low-level visual features in the attention deployment? To answer these questions, a new spatio-temporal computational model is proposed. This model incorporates several visual features; therefore, a fusion algorithm is required to combine the different saliency maps (achromatic, chromatic and temporal). To quantitatively assess the model performances, eye movements were recorded while naive observers viewed natural dynamic scenes. Four completing metrics have been used. In addition, predictions from the proposed model are compared to the predictions from a state of the art model [Itti’s model (Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259)] and from three non-biologically plausible models (uniform, flicker and centered models). Regardless of the metric used, the proposed model shows significant improvement over the selected benchmarking models (except the centered model). Conclusions are drawn regarding both the influence of low-level visual features over time and the central bias in an eye tracking experiment.
Read full abstract