Abstract This paper presents a novel video compression strategy, based on structured video representation, to generate a content-addressable bit-stream supporting retrieval and composition. The structured video representation, extracted by the spatiotemporal segmentation algorithm, comprises a hierarchy of sequences, episodes, shots and motion events that constitute the building blocks of a scripted video stream. The main characteristics of this approach are that (i) it relies on the assumption that temporal redundancy can be efficiently exploited through temporal coherence on a tubewise basis, (ii) it controls the bit assignment according to motion complexity and temporal relations among video elements, (iii) it parameterizes the motion of the entire tube with an affine motion model, and (iv) it synthesizes images from representative texture patterns or anatomical models. The coding-decoding system is based on the construction of a specific video transformations — a tube code —, which, when interpolated and composed according to the temporal relations, produces a sequence of images that approximate the original. Coding efficiency is enhanced because structured video representation allows optimal reduction of temporal redundancy. We show how to design such a system for the coding of CIF-format color digital video ‘Miss America’ (30 frames/s) at rates of 10 and 37 kb/s, using 3-D face wireframe model and MC-DCT to encode textural changes, respectively. The structured video representation, once marked semantically, can facilitate interactive and content-based operations on image sequences, such as editing, browsing and content-based access and filtering.
Read full abstract