Abstract

This paper proposes a dual-stream 3D space-time convolutional neural network action recognition framework. The original depth map sequence data is set as the input in order to study the global space-time characteristics of each action category. The high correlation within the human action itself is considered in the time domain, and then the deep motion map sequence is introduced as the input to another stream of the 3D space-time convolutional network. Furthermore, the corresponding 3D skeleton sequence data is set as the third input of the whole recognition framework. Although the skeleton sequence data has the advantage of including 3D information, it is also confronted with the problems of the existence of rate change, temporal mismatch and noise. Thus, specially designed space-time features are applied to cope with these problems. The proposed methods allow the whole recognition system to fully exploit and utilize the discriminatory space-time features from different perspectives, and ultimately improve the classification accuracy of the system. Experimental results on different public 3D data sets illustrate the effectiveness of the proposed method.

Highlights

  • As an important branch of computational vision, behavior identification has a wide range of applications, such as intelligent surveillance, medical care, human-computer interaction, virtual reality, etc. [1,2]

  • We store the features of the depth sequence from the 3D spatial-temporal convolutional neural network, the second 3D convolutional neural network and the 3D maximal pooling layer as corresponding 3D convolutional features for later SVM classification usage

  • We use the depth motion maps corresponding to three subsets AS1, AS2 and

Read more

Summary

Introduction

As an important branch of computational vision, behavior identification has a wide range of applications, such as intelligent surveillance, medical care, human-computer interaction, virtual reality, etc. [1,2]. With the advent of RGB-D (RGB-Depth) sensor technology, behavior identification is able to capture both the RGB image of the scene and its corresponding depth map simultaneously in real time. Depth maps can provide 3D geometric cues that are less sensitive to illumination variations compared with traditional RGB images [3,4]. The depth sensor can provide real-time estimations of the 3D joint positions of the human skeleton, and can further eliminate the effects of cluttered background and brightness changes. Shotton et al [5] proposed a very powerful human motion capture technique that could estimate the 3D joint position of a human skeleton from a single depth map. 3D human behavior recognition with RGB-D multimodal sequence data has drawn great attention in recent years

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call