This paper introduces a new form of representation for three-dimensional (3-D) video objects. We have developed a technique to extract disparity and texture data from video objects that are captured simultaneously with multiple-camera configurations. For this purpose, we derive an "area of interest" (AOI) for each of the camera views, which represents an area on the video object's surface that is best visible from this specific camera viewpoint. By combining all AOIs, we obtain the video object plane as an unwrapped surface of a 3-D object, containing all texture data visible from any of the cameras. This texture surface can be encoded like any 2-D video object plane, while the 3-D information is contained in the associated disparity map. It is then possible to reconstruct different viewpoints from the texture surface by simple disparity-based projection. The merits of the technique are efficient multiview encoding of single video objects and support for viewpoint adaptation functionality, which is desirable in mixing natural and synthetic images. We have performed experiments with the MPEG-4 video verification model, where the disparity map is encoded by use of the tools provided for grayscale alpha data encoding. Due to its simplicity, the technique is suitable for applications that require real-time viewpoint adaptation toward video objects.
Read full abstract