Multimedia event detection using visual concept signatures

Ehsan Younessian,Teruko Mitamura,Alex Hauptmann,Michael Quinn

doi:10.1117/12.2008425

Abstract

Multimedia Event Detection (MED) is a multimedia retrieval task with the goal of finding videos of a particular event in a large-scale Internet video archive, given example videos and text descriptions. In this paper, we mainly focus on an 'ad-hoc' scenario in MED where we do not use any example video. We aim to retrieve test videos based on their visual semantics using a Visual Concept Signature (VCS) generated for each event only derived from the event description provided as the query. Visual semantics are described using the Semantic INdexing (SIN) feature which represents the likelihood of predefined visual concepts in a video. To generate a VCS for an event, we project the given event description to a visual concept list using the proposed textual semantic similarity. Exploring SIN feature properties, we harmonize the generated visual concept signature and the SIN feature to improve retrieval performance. We conduct different experiments to assess the quality of generated visual concept signatures with respect to human expectation, and in the context of the MED task to retrieve the SIN feature of videos in the test dataset when we have no or only very few training videos.

Full Text