Complex event detection via attention-based video representation and classification

Zhicheng Zhao,Rui Xiang,Fei Su

doi:10.1007/s11042-017-5058-2

Abstract

As an important task in managing unconstrained web videos, multimedia event detection (MED) has attracted wide attention recently. However, due to the complexities such as high abstraction of the events, various scenes and frequent interactions of individuals etc., MED is quite challenging. In this paper, we propose a novel MED algorithm via attention-based video representation and classification. Firstly, inspired by human's selective attention mechanism, an attention-based saliency localization network (ASLN) is constructed to quickly predict the semantic saliency objects of video frames. Afterwards, in order to complementarily represent salient objects and the surroundings, two Convolutional Neural Networks (CNNs) features, i.e., local saliency feature and global feature are respectively extracted from the salient objects and the whole feature map. Thirdly, after binding two features together, Vector of Locally Aggregated Descriptors (VLAD) is applied to encode them into the video representation. Finally, the linear Support Vector Machine (SVM) classifiers are trained to classify. We extensively evaluate the performance on TRECVID MED14_10Ex, MED14_100Ex and Columbia Consume Video (CCV) datasets. Experimental results show that the proposed single model outperforms state-of-the-art approaches on all three real-world video datasets, and demonstrate the effectiveness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Complex event detection via attention-based video representation and classification

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications

Lead the way for us

Journal: Multimedia Tools and Applications	Publication Date: Aug 10, 2017
Citations: 7

Similar Papers

Author response: Invariant representation of physical stability in the human brain
RT Pramod ... Joshua B Tenenbaum
-
RT Pramod, et. al.RT Pramod ... Joshua B Tenenbaum
09 Feb 2022
09 Feb 2022

VLAD encoded Deep Convolutional features for unconstrained face verification
Jingxiao Zheng ... Rama Chellappa
-
Jingxiao Zheng, et. al. Jingxiao Zheng ... Rama Chellappa
01 Dec 2016
01 Dec 2016

Traffic scene recognition based on deep CNN and VLAD spatial pyramids
Fang-Yu Wu ... Bai-Ling Zhang
-
Fang-Yu Wu, et. al.Fang-Yu Wu ... Bai-Ling Zhang
01 Jul 2017
01 Jul 2017

Fusion of Global and Local Deep Features Using Bag of Words and VLAD Models for Human Activity Recognition
Amany Abdelbaky ... Saleh Aly
-
Amany Abdelbaky, et. al.Amany Abdelbaky ... Saleh Aly
01 Nov 2020
01 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Complex event detection via attention-based video representation and classification

Abstract

Talk to us

Similar Papers

More From: Multimedia Tools and Applications