Abstract

A biologically inspired computational approach to model top-down and bottom-up visual attention is proposed in this paper. This model includes a training phase and an attention phase. In the training phase, low-level visual object's feature dimensions such as color, intensity, orientation and texture are used; the visual features are extracted from object itself and do not depend on the background information. These features are represented by mean and standard deviation stored in long-term memory (LTM). In the attention phase, corresponding features are extracted in the attended image. For each feature, the similarity map is obtained by comparing training feature map and attended feature map. The more similarly, the stronger of the similarity map. Then all the similarity maps are combined into a top-down saliency map. In the same time, a bottom-up saliency map is acquired by the contrast of attended image itself. At last, top-down and bottom-up saliency map are fused into a final saliency map. Experimental results indicate that: when the attended object does not always appear in the background similar to that in the training images or their combinations change hugely between training images and attended images, our proposed approach is excellent to VOCUS top-down approach and Navalpakkam's approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call