Semantic video content annotation at the object level

Vanessa El-Khoury,Harald Kosch,David Coquil,Martin Jergler

doi:10.1145/2428955.2428991

Abstract

A vital prerequisite for fine-grained video content processing (indexing, querying, retrieval, adaptation, etc.) is the production of accurate metadata describing its structure and semantics. Several annotation tools were presented in the literature generating metadata at different granularities (i.e. scenes, shots, frames, objects). These tools have a number of limitations with respect to the annotation of objects. Though they provide functionalities to localize and annotate an object in a frame, the propagation of this information in the next frames still requires human intervention. Furthermore, they are based on video models that lack expressiveness along the spatial and semantic dimensions. To address these shortcomings, we propose the Semantic Video Content Annotation Tool (SVCAT) for structural and high-level semantic annotation. SVCAT is a semi-automatic annotation tool compliant with the MPEG-7 standard, which produces metadata according to an object-based video content model described in this paper. In particular, the novelty of SVCAT lies in its automatic propagation of the object localization and description metadata realized by tracking their contour through the video, thus drastically alleviating the task of the annotator. Experimental results show that SVCAT provides accurate metadata to object-based applications, particularly exact contours of multiple deformable objects.

Full Text