Marginalized multi-layer multi-instance kernel for video concept detection

Zheng-Jun Zha,Tao Mei,Richang Hong,Zhiwei Gu

doi:10.1016/j.sigpro.2012.08.026

Abstract

Video concept detection has been extensively studied in recent years. Most of the existing video concept detection approaches have treated video as a flat data sequence. However, video is essentially a kind of media with hierarchical structure, including multiple layers (e.g., video shot, frame, and region) and multiple instance relationship embedded in each pair of contiguous layers. In this paper, we propose a novel kernel, termed marginalized multi-layer multi-instance (MarMLMI) kernel for video concept detection. Different from most existing methods, the proposed MarMLMI kernel exploits the hierarchical structure of video, i.e., both the multi-layer structure and the multi-instance relationship. Furthermore, the instance label ambiguity in multi-instance setting is addressed by using the technology of marginalized kernel. We perform video concept detection on a real-world video corpus: the TREC video retrieval evaluation (TRECVID) benchmark and compare the proposed MarMLMI kernel to representative existing approaches. The experimental results demonstrate the effectiveness of the proposed MarMLMI kernel.

Full Text