Learning spatial–temporal features via a pose-flow relational model for action recognition

Qianyu Wu,Fangqiang Hu,Yaping Bao,Zixuan Wang,Aichun Zhu

doi:10.1063/5.0011161

Qianyu Wu, Fangqiang Hu + Show 3 more

Open Access

https://doi.org/10.1063/5.0011161

Copy DOI

Abstract

Pose-based action recognition has always been an important research field in computer vision. However, most existing pose-based methods are built upon human skeleton data, which cannot be used to exploit the feature of the motion-related object, i.e., a crucial clue of discriminating human actions. To address this issue, we propose a novel pose-flow relational model, which can benefit from both pose dynamics and optical flow. First, we introduce a pose estimation module to extract the skeleton data of the key person from the raw video. Second, a hierarchical pose-based network is proposed to effectively explore the rich spatial–temporal features of human skeleton positions. Third, we embed an inflated 3D network to capture the subtle cues of the motion-related object from optical flow. Additionally, we evaluate our model on four popular action recognition benchmarks (HMDB-51, JHMDB, sub-JHMDB, and SYSU 3D). Experimental results demonstrate that the proposed model outperforms the existing pose-based methods in human action recognition.

Highlights

In the past few years, human action recognition in videos has gained increasing attention for its wide range of applications in smart surveillance systems, human–computer interaction, and motion analysis.1–4 There still exist some challenges in practical applications, such as interference of complex background, body occlusion due to camera position, and pattern blurring due to fast motion
Compared with the appearance-based method which uses a red–green–blue (RGB) image or optical flow as the input of the model, the pose-based action recognition method10 can effectively capture rich spatial–temporal cues of human actions without being interfered with irrelevant information. The evolution in this direction is mainly attributed to the emergence of deep neural networks, e.g., convolutional neural network (CNN), recurrent neural network (RNN), and 3D convolutional neural networks
Optical flow is usually represented by a color image with Hue-Saturation-Value (HSV) color space, while human pose is represented by the positions of human skeleton joints

Summary

INTRODUCTION

In the past few years, human action recognition in videos has gained increasing attention for its wide range of applications in smart surveillance systems, human–computer interaction, and motion analysis. There still exist some challenges in practical applications, such as interference of complex background, body occlusion due to camera position, and pattern blurring due to fast motion. Compared with the appearance-based method which uses a red–green–blue (RGB) image or optical flow as the input of the model, the pose-based action recognition method can effectively capture rich spatial–temporal cues of human actions without being interfered with irrelevant information. The evolution in this direction is mainly attributed to the emergence of deep neural networks, e.g., convolutional neural network (CNN), recurrent neural network (RNN), and 3D convolutional neural networks.. A hierarchical pose-based network is proposed to explore the rich spatial–temporal features of human skeleton data. Experimental results show that the proposed model outperforms the existing pose-based methods in human action recognition

Human pose estimation

Pose-based action recognition in video

Feature aggregation

Overview

Pose estimation module

Pose-flow relational model

Feature aggregation for action recognition

EXPERIMENTS

Datasets

Implementation details

Ablation studies

Method

Comparison on class accuracies

Comparison with the state-of-the-art methods

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: AIP Advances	Publication Date: Jul 1, 2020
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Learning spatial–temporal features via a pose-flow relational model for action recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: AIP Advances

Lead the way for us

Similar Papers

Graph Convolutional Neural Network for Human Action Recognition: A Comprehensive Survey
Tasweer Ahmad ... Xin Zhang
IEEE Transactions on Artificial Intelligence | VOL. 2
Tasweer Ahmad, et. al.Tasweer Ahmad ... Xin Zhang
01 Apr 2021
IEEE Transactions on Artificial Intelligence | VOL. 2

Visual cues for view-invariant human action recognition

-

17 Feb 2017
17 Feb 2017

Video Based Human Activity Detection, Recognition and Classification of actions using SVM
Jagadeesh B ... Chandrashekar M Patil
Transactions on Machine Learning and Artificial Intelligence | VOL. 6
Jagadeesh B, et. al.Jagadeesh B ... Chandrashekar M Patil
31 Dec 2019
Transactions on Machine Learning and Artificial Intelligence | VOL. 6

A review of action recognition methods based on skeleton data
Dengge Zhao ... Dan Xu
-
Dengge Zhao, et. al.Dengge Zhao ... Dan Xu
16 Feb 2022
16 Feb 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning spatial–temporal features via a pose-flow relational model for action recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: AIP Advances