TagBook: A Semantic Video Representation Without Supervision for Event Detection

Masoud Mazloom,Xirong Li,Cees G M Snoek

doi:10.1109/tmm.2016.2559947

Masoud Mazloom, Xirong Li + Show 1 more

Open Access

https://doi.org/10.1109/tmm.2016.2559947

Copy DOI

Abstract

We consider the problem of event detection in video for scenarios where only a few, or even zero, examples are available for training. For this challenging setting, the prevailing solutions in the literature rely on a semantic video representation obtained from thousands of pretrained concept detectors. Different from existing work, we propose a new semantic video representation that is based on freely available social tagged videos only, without the need for training any intermediate concept detectors. We introduce a simple algorithm that propagates tags from a video's nearest neighbors, similar in spirit to the ones used for image retrieval, but redesign it for video event detection by including video source set refinement and varying the video tag assignment. We call our approach TagBook and study its construction, descriptiveness, and detection performance on the TRECVID 2013 and 2014 multimedia event detection datasets and the Columbia Consumer Video dataset. Despite its simple nature, the proposed TagBook video representation is remarkably effective for few-example and zero-example event detection, even outperforming very recent state-of-the-art alternatives building on supervised representations.

Full Text