Semantic Cues Enhanced Multimodality Multistream CNN for Action Recognition

Zhigang Tu,Baoxin Li,Wei Xie,Junsong Yuan,Justin Dauwels

doi:10.1109/tcsvt.2018.2830102

Abstract

This paper addresses the issue of video-based action recognition by exploiting an advanced multistream convolutional neural network (CNN) to fully use semantics-derived multiple modalities in both spatial (appearance) and temporal (motion) domains, since the performance of the CNN-based action recognition methods heavily relates to two factors: semantic visual cues and the network architecture. Our work consists of two major parts. First, to extract useful human-related semantics accurately, we propose a novel spatiotemporal saliency-based video object segmentation (STS) model. By fusing different distinctive saliency maps, which are computed according to object signatures of complementary object detection approaches, a refined STS maps can be obtained. In this way, various challenges in the realistic video can be handled jointly. Based on the estimated saliency maps, an energy function is constructed to segment two semantic cues: the actor and one distinctive acting part of the actor. Second, we modify the architecture of the two-stream network (TS-Net) to design a multistream network that consists of three TS-Nets with respect to the extracted semantics, which is able to use deeper abstract visual features of multimodalities in multi-scale spatiotemporally. Importantly, the performance of action recognition is significantly boosted when integrating the captured human-related semantics into our framework. Experiments on four public benchmarks—JHMDB, HMDB51, UCF-Sports, and UCF101—demonstrate that the proposed method outperforms the state-of-the-art algorithms.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic Cues Enhanced Multimodality Multistream CNN for Action Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society

Lead the way for us

Journal: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society	Publication Date: May 1, 2019
Citations: 157

Similar Papers

More efficient and effective tricks for deep action recognition
Zheyuan Liu ... Xiaoteng Zhang
Cluster Computing | VOL. 22
Zheyuan Liu, et. al.Zheyuan Liu ... Xiaoteng Zhang
07 Nov 2017
Cluster Computing | VOL. 22

Graph Convolutional Neural Network for Human Action Recognition: A Comprehensive Survey
Tasweer Ahmad ... Lianwen Jin
IEEE transactions on artificial intelligence | VOL. 2
Tasweer Ahmad, et. al.Tasweer Ahmad ... Lianwen Jin
01 Apr 2021
IEEE transactions on artificial intelligence | VOL. 2

Spatiotemporal Saliency Based Multi-stream Networks for Action Recognition
Zhenbing Liu ... Wanting Ji
-
Zhenbing Liu, et. al.Zhenbing Liu ... Wanting Ji
01 Jan 2020
01 Jan 2020

Skeleton-based Action Recognition with Lie Group and Deep Neural Networks
Yanshan Li ... Rongiie Xia
-
Yanshan Li, et. al.Yanshan Li ... Rongiie Xia
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic Cues Enhanced Multimodality Multistream CNN for Action Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society