Multi-Term Attention Networks for Skeleton-Based Action Recognition

Xiaolei Diao,Xiaoqiang Li,Chen Huang

doi:10.3390/app10155326

Xiaolei Diao, Xiaoqiang Li + Show 1 more

Open Access

https://doi.org/10.3390/app10155326

Copy DOI

Journal: Applied Sciences	Publication Date: Jul 31, 2020
Citations: 4	License type: CC BY 4.0

Affiliation: Shanghai University of Engineering Science

Abstract

The same action takes different time in different cases. This difference will affect the accuracy of action recognition to a certain extent. We propose an end-to-end deep neural network called “Multi-Term Attention Networks” (MTANs), which solves the above problem by extracting temporal features with different time scales. The network consists of a Multi-Term Attention Recurrent Neural Network (MTA-RNN) and a Spatio-Temporal Convolutional Neural Network (ST-CNN). In MTA-RNN, a method for fusing multi-term temporal features are proposed to extract the temporal dependence of different time scales, and the weighted fusion temporal feature is recalibrated by the attention mechanism. Ablation research proves that this network has powerful spatio-temporal dynamic modeling capabilities for actions with different time scales. We perform extensive experiments on four challenging benchmark datasets, including the NTU RGB+D dataset, UT-Kinect dataset, Northwestern-UCLA dataset, and UWA3DII dataset. Our method achieves better results than the state-of-the-art benchmarks, which demonstrates the effectiveness of MTANs.

Highlights

Human Action Recognition (HAR) has attracted the attention of research communities in the computer vision area in recent years
The problem is that the general action recognition method can only extract single-term temporal features, and the ability to model the spatio-temporal dynamics of actions with different time scale is limited
For the input skeleton sequences, a multi-LSTM module based on temporal sliding is used to capture temporal features information for input action sequences at different terms, and the temporal feature is calibrated by the Attention Recalibration Module (ARM)

Summary

Introduction

Human Action Recognition (HAR) has attracted the attention of research communities in the computer vision area in recent years. The problem is that the general action recognition method can only extract single-term temporal features, and the ability to model the spatio-temporal dynamics of actions with different time scale is limited. The Multi-Term Temporal Sliding LSTM (MT-TS-LSTM) is introduced in MTA-RNN for extracting features with different time scales. We propose a general Multi-Term Attention Networks (MTANs) for skeleton-based action recognition. It introduces a method for fusing multi-term temporal features to solve the action recognitions problem of actions with large time-scale differences. Networks with our strategy are able to reinforce temporal features for classifying actions

Research Significance

Related Work

Methods

Multi-Term Attention Recurrent Neural Network

Multi-Term Temporal Sliding LSTM

Attention Recalibration Based on Fusion Features

Spatio-Temporal Convolution Neural Network

Experiments and Analysis

Datasets

UT-Kinect Action Dataset

Northwestern-UCLA Dataset

UWA3DII Dataset

Experiment Design

MTA-RNN and ST-CNN

Effectiveness of Each Module in MTA-RNN

Feature Concatenation and Weighted Feature Fusion

Combined Model and MTANs

Experimental Results

UT-Kinect Action Dataset Results

Northwestern-UCLA Dataset Results

UWA3DII Dataset Results

Analysis of Results

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Term Attention Networks for Skeleton-Based Action Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Spatial-temporal interaction learning based two-stream network for action recognition
Tianyu Liu ... Ping Jiang
Information Sciences | VOL. 606
Tianyu Liu, et. al.Tianyu Liu ... Ping Jiang
28 May 2022
Information Sciences | VOL. 606

STFCN: Spatio-Temporal Fully Convolutional Neural Network for Semantic Segmentation of Street Scenes
Mohsen Fayyaz ... Fay Huang
-
Mohsen Fayyaz, et. al.Mohsen Fayyaz ... Fay Huang
01 Jan 2017
01 Jan 2017

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling
Ashok Sarabu ... Ajit Kumar Santra
Data | VOL. 5
Ashok Sarabu, et. al.Ashok Sarabu ... Ajit Kumar Santra
11 Nov 2020
Data | VOL. 5

Adaptive Attention Memory Graph Convolutional Networks for Skeleton-Based Action Recognition.
Di Liu ... Yinghua Lu
Sensors | VOL. 21
Di Liu, et. al.Di Liu ... Yinghua Lu
12 Oct 2021
Sensors | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Term Attention Networks for Skeleton-Based Action Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences