Temporal Spiking Recurrent Neural Network for Action Recognition

Wei Wang,Yunchao Wei,Siyuan Hao,Nicu Sebe,Shengtao Xiao,Jiashi Feng

doi:10.1109/access.2019.2936604

Abstract

In this paper, we propose a novel temporal spiking recurrent neural network (TSRNN) to perform robust action recognition in videos. The proposed TSRNN employs a novel spiking architecture which utilizes the local discriminative features from high-confidence reliable frames as spiking signals. The conventional CNN-RNNs typically used for this problem treat all the frames equally important such that they are error-prone to noisy frames. The TSRNN solves this problem by employing a temporal pooling architecture which can help RNN select sparse and reliable frames and enhances its capability in modelling long-range temporal information. Besides, a message passing bridge is added between the spiking signals and the recurrent unit. In this way, the spiking signals can guide RNN to correct its long-term memory across multiple frames from contamination caused by noisy frames with distracting factors ( e.g. , occlusion, rapid scene transition). With these two novel components, TSRNN achieves competitive performance compared with the state-of-the-art CNN-RNN architectures on two large scale public benchmarks, UCF101 and HMDB51.

Highlights

Human action recognition in videos has drawn growing attention in computer vision, owing to its broad practical applications in many areas such as visual surveillance, behavior analysis, and virtual reality [1]–[5]
To recognize actions more robustly even in the presence of noisy frames, we propose a novel temporal spiking recurrent neural network (TSRNN)
Our contribution can be summarized as follows: (i) We propose a novel temporal spiking recurrent neural network (TSRNN) where the pooling operation is implemented at the frame-level instead of the pixellevel

Summary

INTRODUCTION

Human action recognition in videos has drawn growing attention in computer vision, owing to its broad practical applications in many areas such as visual surveillance, behavior analysis, and virtual reality [1]–[5]. The reason is that the CNN-based methods only learn the local visual appearance of each frame and are limited in modeling the long-term cross-frame motion and other dynamics from a global view, leading to inferior performance To address this issue, some works [8], [9], [18], [19] propose to build recurrent neural networks (RNNs) upon CNNs for capturing the long-term information. These methods treat the information from all the frames important and this inevitably introduces noise from some ‘‘bad’’ frames caused by occlusion, fast moving or rapid scene transition Such noise will contaminate the representations learned by the RNN in an accumulative way, which may bring irreparable damage to the final action recognition result.

RELATED WORK

KEY-FRAME BRANCH

TEMPORAL CONTEXT BRANCH

ACCUMULATIVE LOSS FUNCTION

FUSION OF RGB-TSRNN AND OF-TSRNN

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 47	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Temporal Spiking Recurrent Neural Network for Action Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Human action recognition in surveillance video of a computer laboratory
Abdul-Lateef Yussiff ... Yong Suet-Peng
-
Abdul-Lateef Yussiff, et. al.Abdul-Lateef Yussiff ... Yong Suet-Peng
01 Aug 2016
01 Aug 2016

Understanding action recognition in still images
Deeptha Girish ... Anca Ralescu
-
Deeptha Girish, et. al.Deeptha Girish ... Anca Ralescu
01 Jun 2020
01 Jun 2020

GA-STIP: Action Recognition in Multi-Channel Videos With Geometric Algebra Based Spatio-Temporal Interest Points
Rui Wang ... Weici Xue
IEEE Access | VOL. 6
Rui Wang, et. al.Rui Wang ... Weici Xue
01 Jan 2018
IEEE Access | VOL. 6

Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos
José Lopes ... Sameer Singh
-
José Lopes, et. al.José Lopes ... Sameer Singh
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Temporal Spiking Recurrent Neural Network for Action Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access