A Fine Granularity Object-Level Representation for Event Detection and Recounting

Hao Zhang,Chong-Wah Ngo

doi:10.1109/tmm.2018.2884478

Abstract

Multimedia events such as “birthday party” usually involve the complex interaction between humans and objects. Unlike actions and sports, these events rarely contain unique motion patterns to be vividly explored for recognition. To encode rich objects in the events, a common practice is to tag an individual video frame with object labels, represented as a vector signifying probabilities of object appearances. These vectors are then pooled across frames to obtain a video-level representation. The current practices suffer from two deficiencies due to the direct employment of deep convolutional neural network (DCNN) and standard feature pooling techniques. First, the use of max-pooling and softmax layers in DCNN overemphasize the primary object or scene in a frame, producing a sparse vector that overlooks the existence of secondary or small-size objects. Second, feature pooling by max or average operator over sparse vectors makes the video-level feature unpredictable in modeling the object composition of an event. To address these problems, this paper proposes a new video representation, named Object-VLAD, which treats each object equally and encodes them into a vector for multimedia event detection. Furthermore, the vector can be flexibly decoded to identify evidences such as key objects to recount the reason why a video is retrieved for an event of interest. Experiments conducted on MED13 and MED14 datasets verify the merit of Object-VLAD by consistently outperforming several state-of-the-arts in both event detection and recounting.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Fine Granularity Object-Level Representation for Event Detection and Recounting

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Jun 1, 2019
Citations: 14

Similar Papers

Enhancement of ELDA Tracker Based on CNN Features and Adaptive Model Update.
Changxin Gao ... Jin-Gang Yu
Sensors | VOL. 16
Changxin Gao, et. al.Changxin Gao ... Jin-Gang Yu
15 Apr 2016
Sensors | VOL. 16

Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and SIFT point features
Muhammad Rashid ... Muhammad Masood Sarfraz
Multimedia Tools and Applications | VOL. 78
Muhammad Rashid, et. al.Muhammad Rashid ... Muhammad Masood Sarfraz
08 Dec 2018
Multimedia Tools and Applications | VOL. 78

Transferred Deep Convolutional Neural Network Features for Extensive Facial Landmark Localization
Shaohua Zhang ... Zhou-Ping Yin
IEEE Signal Processing Letters | VOL. 23
Shaohua Zhang, et. al.Shaohua Zhang ... Zhou-Ping Yin
01 Apr 2016
IEEE Signal Processing Letters | VOL. 23

Evaluation of Feature Channels for Correlation-Filter-Based Visual Object Tracking in Infrared Spectrum
Erhan Gundogdu ... Berkan Solmaz
-
Erhan Gundogdu, et. al.Erhan Gundogdu ... Berkan Solmaz
01 Jun 2016
01 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Fine Granularity Object-Level Representation for Event Detection and Recounting

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia