Multi-Localized Sensitive Autoencoder-Attention-LSTM For Skeleton-based Action Recognition

Wing Ng,Ting Wang,Mingyang Zhang

doi:10.1109/tmm.2021.3070127

Abstract

One of key challenges of skeleton-based action recognition (SAR) tasks is the complex nature of human motion patterns. Variations such as performers and viewpoints may impose negative effects to the action recognition accuracy. In this work, we propose the Multi-Localized Sensitive Autoencoder-Attention-LSTM (Multi-LiSAAL) for SAR. The Localized Stochastic Sensitive Autoencoder (LiSSA) encodes both spatial and temporal information, and extracts meaningful features from different parts (four limbs and a trunk) from the skeleton. The LiSSA is trained by minimizing the localized generalization error to enhance the robustness of autoencoders via reducing its sensitivity with respect to small variations in inputs. We apply an attention mechanism to assign different weights to different skeleton parts and focus more on informative sections. Then, a backbone classifier network takes weighted features as inputs to differentiates actions. Experimental results on five public benchmarking datasets show that the Multi-LiSAAL outperforms state-of-the-art methods.

Full Text