Abstract

Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME's performance in the 2012 TRECVID MED evaluation was one of the best reported.

Highlights

  • The goal of multimedia event detection (MED) is to detect user-defined events of interest in massive, continuously growing video collections, such as those found on the Internet

  • The MED evaluation uses the Heterogeneous Audio Visual Internet Collection (HAVIC) video data collection [18], which is a large corpus of Internet multimedia files collected by the Linguistic Data Consortium

  • The work in this paper focuses on SEarch with Speed and Accuracy for Multimedia Events (SESAME), an MED system in which an event is specified as a set of video clip examples

Read more

Summary

Introduction

The goal of multimedia event detection (MED) is to detect user-defined events of interest in massive, continuously growing video collections, such as those found on the Internet. Events are more complex and may include actions (hammering, pouring liquid) and activities (dancing) occurring in different scenes (street, kitchen). Some events may be process-oriented, with an expected sequence of stages, actions, or activities (making a sandwich or repairing an appliance); other events may be a set of ongoing activities with no particular beginning or end (birthday party or parade). An event may be observed in only a portion of the video clip, and relevant clips may contain extraneous content

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call