3D TV is a turning point in the history of television. Here, technologies from computer graphics, computer vision, multimedia, telecommunications, broadcasting, and other related fields merge to expand the sensations provided by watching conventional 2D TV.1 As a result, there has been much research on multi-view image processing and 3D display.2, 3 However, most of the previous studies have focused on acquiring and displaying images while optimizing their ‘naturalness’, simplicity of computation, and real-time operation. Free-viewpoint TV (FTV)4 takes this idea forward. Free-viewpoint systems allow a viewer to look at a scene from the perspective that they want rather than what a director chooses to provide.1 In order to generate a scene according to the user’s requirements, a position in 3D space has to be defined as the view point. To achieve this, we suggest an intelligent approach that automatically generates reference viewpoints based on a theory of human visual attention.Many studies of visual attention and eye movements have shown that humans generally only attend to a few areas in an image rather than scanning the whole, and visual attention models provide a general approach to controlling the activities of active vision systems.5 These models of selective visual attention have been suggested on the basis of evidence in psychology, psychophysics, physiology, etc., and can also be exploited for our purposes. Figure 1 shows an examples of how our artificial human visual attention system can be used to obtain a basis position to generate a scene that corresponds to a user’s viewpoint. The system also simultaneously computes the intensity of salience of an object in a given region. In fact, several objects or regions can be detected in terms of their their individual saliencies, which can in turn be used to describe them. The system computes early visual features from a set of pre-attentive feature maps in a massively parallel way. Activity from all feature maps is combined Figure 1. Schematic diagram showing the proposed system based on a bottom-up approach to attention.
Read full abstract