Abstract

Human action recognition (HAR) is an important yet challenging task. This paper presents a novel method. First, fuzzy weight functions are used in computations of depth motion maps (DMMs). Multiple length motion information is also used. These features are referred to as fuzzy weighted multi-resolution DMMs (FWMDMMs). This formulation allows for various aspects of individual actions to be emphasized. It also helps to characterise the importance of the temporal dimension. This is important to help overcome, e.g., variations in time over which a single type of action might be performed. A deep convolutional neural network (CNN) motion model is created and trained to extract discriminative and compact features. Transfer learning is also used to extract spatial information from RGB and depth data using the AlexNet network. Different late fusion techniques are then investigated to fuse the deep motion model with the spatial network. The result is a spatial temporal HAR model. The developed approach is capable of recognising both human action and human–object interaction. Three public domain datasets are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the art algorithms.

Highlights

  • Human action recognition (HAR) is a challenging field

  • This paper proposes a novel improvement on hand-crafted features based on fuzzy weighted multi-resolution depth motion maps (FWMDMMs)

  • Our method achieves an improvement of 3.88% using FWMDMM with spatial information

Read more

Summary

Introduction

Human action recognition (HAR) is a challenging field. This is due to a number of reasons. Many studies on HAR have used deep learning, either with colour image sequences or depth sequences [14,15,16] Most of these deep networks are based on either pre-extracted hand-crafted features or raw colour/depth sequences as inputs. This paper proposes a novel improvement on hand-crafted features based on fuzzy weighted multi-resolution depth motion maps (FWMDMMs). These help to characterise different important aspects of the depth motion data across multiple time resolutions. The work presented here enhances the recognition system performance by including deep learning This is used to process the DMM features and to provide more discriminative information for each action.

Related Work
Construction of Fuzzy Weighted Multi-Resolutions Depth Motion Map
Transfer Learning of Spatial Information
Fusing the Spatial Networks
Deep Motion Model
Implementation
Experimental Results and Discussion
Northwestern-UCLA Multi-View Action 3D Dataset
MSR Action 3D Dataset
MSR Daily Activity 3D Dataset
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call