A component-based video content representation for action recognition

Vida Adeli,Ehsan Fazl-Ersi,Ahad Harati

doi:10.1016/j.imavis.2019.08.009

Abstract

This paper investigates the challenging problem of action recognition in videos and proposes a new component-based approach for video content representation. Although satisfactory performance for action recognition has already been obtained for certain scenarios, many of the existing solutions require fully-annotated video datasets in which region of the activity in each frame is specified by a bounding box. Another group of methods require auxiliary techniques to extract human-related areas in the video frames before being able to accurately recognize actions. In this paper, a Weakly-Supervised Learning (WSL) framework is introduced that eliminates the need for per-frame annotations and learns video representations that improve recognition accuracy and also highlights the activity related regions within each frame. To this end, two new representation ideas are proposed, one focus on representing the main components of an action, i.e. actionness regions, and the other focus on encoding the background context to represent general and holistic cues. A three-stream CNN is developed, which takes the two proposed representations and combines them with a motion-encoding stream. Temporal cues in each of the three different streams are modeled through LSTM, and finally fully-connected neural network layers are used to fuse various streams and produce the final video representation. Experimental results on four challenging datasets, demonstrate that the proposed Component-based Multi-stream CNN model (CM-CNN), trained on a WSL setting, outperforms the state-of-the-art in action recognition, even the fully-supervised approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A component-based video content representation for action recognition

Abstract

Talk to us

Similar Papers

More From: Image and Vision Computing

Lead the way for us

Journal: Image and Vision Computing	Publication Date: Aug 29, 2019
Citations: 14

Similar Papers

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks
Hanbo Wu ... Xin Ma
International Journal of Advanced Robotic Systems | VOL. 16
Hanbo Wu, et. al.Hanbo Wu ... Xin Ma
01 Jan 2019
International Journal of Advanced Robotic Systems | VOL. 16

Understanding action recognition in still images
Deeptha Girish ... Anca Ralescu
-
Deeptha Girish, et. al.Deeptha Girish ... Anca Ralescu
01 Jun 2020
01 Jun 2020

Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos
José Lopes ... Sameer Singh
-
José Lopes, et. al.José Lopes ... Sameer Singh
01 Jan 2006
01 Jan 2006

GA-STIP: Action Recognition in Multi-Channel Videos With Geometric Algebra Based Spatio-Temporal Interest Points
Rui Wang ... Weici Xue
IEEE Access | VOL. 6
Rui Wang, et. al.Rui Wang ... Weici Xue
01 Jan 2018
IEEE Access | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A component-based video content representation for action recognition

Abstract

Talk to us

Similar Papers

More From: Image and Vision Computing