Image and video indexing using vector quantization

F Idris,S Panchanathan

doi:10.1007/s001380050058

Abstract

Visual database systems require efficient indexing to enable fast access to the images and video. In addition, the large memory capacity and channel bandwidth requirements for the storage and transmission of visual data necessitate the use of compression techniques. Future multimedia applications are likely to increasingly store and transmit the visual information in compressed form. Hence indexing the visual content in compressed domain is expected to result in significant savings in computational complexity. Vector quantization (VQ) is an efficient technique for low bit rate image and video compression. In addition, the lower complexity of the decoder makes VQ attractive for low power systems and applications which require fast decoding. Most importantly, VQ is naturally an indexing technique, where a block of pixels is compactly represented using an index (label) which corresponds to a codebook. In this thesis, we propose the novel concept of using VQ for joint compression and indexing of images and video. The images/image frames are compressed using VQ and the labels and codewords are employed in indexing the visual content. First, we present a review of image/video compression and indexing. We then propose two techniques in the VQ compressed domain for image indexing. In the first technique, the histogram of codewords weighted by the number of labels is used as feature vector for indexing. In the second technique, the histogram of the labels, which are used to represent an image, is used as an index. We also propose a new technique based on adaptive wavelet VQ, which provides an improvement in coding and retrieval performance. Here, the images are decomposed using wavelet transform followed by VQ of the transform coefficients. A usage map of codewords is generated for each image and is stored along with the image. In the retrieval process, the usage map of the query image (VQ encoded) is compared with the corresponding usage maps of the target images in the database. Since video has both spatial and temporal dimensions, a straightforward extension of the image indexing techniques for video indexing is inefficient. We propose to employ both the spatial and temporal features for efficient indexing of video clips. The video sequence is partitioned into shots using the label maps of the individual frames and the camera operations and motion within each shot are then determined by further processing the label maps. Each shot is then represented using a spatio-temporal index. The spatial index represents the content of the key frame (image) of a shot, while the temporal index represents the motion and camera operations within the shot. Detailed simulations have been carried out using a large database of images and video sequences. Simulation results demonstrate the excellent retrieval performance of the proposed techniques at a significantly reduced computational complexity.

Full Text