Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor

Yuchou Chang,Yi Hong,D J Lee,James Archibald

doi:10.1155/2008/860743

Yuchou Chang, Yi Hong + Show 2 more

PDF Available

https://doi.org/10.1155/2008/860743

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Scale-invariant feature transform (SIFT) transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT) for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.

Highlights

The recent rapid growth of multimedia databases and the increasing demand to provide online access to these databases have brought content-based video retrieval (CBVR) to the attention of many researchers
We focus on clustering-based shot detection [3,4,5,6,7,8,9,10,11,12,13,14] which can be considered as a combination of feature-based and statistics-based methods
In order to avoid the bias from the better clustering strategy we proposed in this paper, we applied the same k-means clustering to the proposed color global scale-invariant feature transform (CGSIFT) and the traditional Scale-invariant feature transform (SIFT) for comparison

Summary

INTRODUCTION

The recent rapid growth of multimedia databases and the increasing demand to provide online access to these databases have brought content-based video retrieval (CBVR) to the attention of many researchers. We focus on clustering-based shot detection [3,4,5,6,7,8,9,10,11,12,13,14] which can be considered as a combination of feature-based and statistics-based methods. We adopt a very powerful color quantization method called Fibonacci lattice-quantization [25] to quantize color information and generate a palette of color indices for SIFT Based on this approach, we propose a novel color feature descriptor using the global context of the video frame.

Scale-invariant feature transform

Clustering ensemble

Retain color information by Fibonacci lattice- quantization

Join global context information into color SIFT

VIDEO SHOT DETECTION USING CLUSTERING ENSEMBLES

PROCESSING TIME AND STORAGE SPACE ANALYSIS

Test videos and ground truth

Single clustering versus clustering ensembles and CGSIFT versus SIFT

TRECVID

CONCLUSIONS