Visual Attention and Distributed Processing of Visual Information for the Control of Humanoid Robots

Ales Ude,Jan Moren,Gordon Cheng

doi:10.5772/4816

Abstract

The function of visual attention is to identify interesting areas in the visual scene so that limited computational resources of a human or an artificial machine can be dedicated to the processing of regions with potentially interesting objects. Already early computational models of visual attention (Koch and Ullmann, 1985) suggested that attention consists of two functionally independent stages: • in the preattentive stage features are processed rapidly and in parallel over the entire visual field until the focus of attention has been identified, which triggers the eye movement towards the target area; • in the second phase, the computational resources are dedicated towards the processing of information in the identified area while ignoring the irrelevant or distracting percepts. Visual attention selectivity can be either overt to drive and guide eye movements or covert, internally shifting the focus of attention from one image region to another without eye movements (Sun and Fisher, 2003). Here we are interested in visual attention that involves eye movements and how to implement it on a humanoid robot. Overt shifts of attention from one selected area to another were demonstrated for example in face recognition experiments (Yarbus, 1967). Although the subjects perceived faces as a whole in these experiments, their eye movements showed that their attention was shifted from one point to another while processing a face. The analysis of fixation points revealed that the subjects performed saccadic eye movements, which are very fast ballistic movements, to acquire data from the most informative areas of the image. Since high velocities disrupt vision and also because the signal that the target had been reached would arrive long after the movement had overshot, saccadic eye movements are not visually guided. The input to the motor system is the desired eye position, which is continuously compared to an efference copy of the internal representation of the eye position. Many computational models of preattentive processing have been influenced by the feature integration theory (Treisman and Gelade, 1980), which resulted in several technical implementations, e. g. (Itti et al., 1998), including some implementations on humanoid robots (Driscoll et al., 1998; Breazeal and Scasselatti, 1999; Stasse et al., 2000; Vijayakumar et

Full Text