Abstract

One of key challenges of skeleton-based action recognition (SAR) tasks is the complex nature of human motion patterns. Variations such as performers and viewpoints may impose negative effects to the action recognition accuracy. In this work, we propose the Multi-Localized Sensitive Autoencoder-Attention-LSTM (Multi-LiSAAL) for SAR. The Localized Stochastic Sensitive Autoencoder (LiSSA) encodes both spatial and temporal information, and extracts meaningful features from different parts (four limbs and a trunk) from the skeleton. The LiSSA is trained by minimizing the localized generalization error to enhance the robustness of autoencoders via reducing its sensitivity with respect to small variations in inputs. We apply an attention mechanism to assign different weights to different skeleton parts and focus more on informative sections. Then, a backbone classifier network takes weighted features as inputs to differentiates actions. Experimental results on five public benchmarking datasets show that the Multi-LiSAAL outperforms state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call