Abstract

We present a novel spatio-temporal descriptor to efficiently represent a video object for the purpose of content-based video retrieval. Features from spatial along with temporal information are integrated in a unified framework for the purpose of retrieval of similar video shots. A sequence of orthogonal processing, using a pair of 1-D multiscale and multispectral filters, on the space-time volume (STV) of a video object (VOB) produces a gradually evolving (smoother) surface. Zero-crossing contours (2-D) computed using the mean curvature on this evolving surface are stacked in layers to yield a hilly (3-D) surface, for a joint multispectro-temporal curvature scale space (MST-CSS) representation of the video object. Peaks and valleys (saddle points) are detected on the MST-CSS surface for feature representation and matching. Computation of the cost function for matching a query video shot with a model involves matching a pair of 3-D point sets, with their attributes (local curvature), and 3-D orientations of the finally smoothed STV surfaces. Experiments have been performed with simulated and real-world video shots using precision-recall metric for our performance study. The system is compared with a few state-of-the-art methods, which use shape and motion trajectory for VOB representation. Our unified approach has shown better performance than other approaches that use combined match-costs obtained with separate shape and motion trajectory representations and our previous work on a simple joint spatio-temporal descriptor (3-D-CSS).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call