STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition

Yuhan Zhang,Chuang Gan,Bo Wu,Wen Li,Lixin Duan

doi:10.1145/3474085.3475473

Abstract

Skeleton-based action recognition has been widely investigated considering their strong adaptability to dynamic circumstances and complicated backgrounds. To recognize different actions from skeleton sequences, it is essential and crucial to model the posture of the human represented by the skeleton and its changes in the temporal dimension. However, most of the existing works treat skeleton sequences in the temporal and spatial dimension in the same way, ignoring the difference between the temporal and spatial dimension in skeleton data which is not an optimal way to model skeleton sequences. The posture represented by the skeleton in each frame is proposed to be modeled individually. Meanwhile, capturing the movement of the entire skeleton in the temporal dimension is needed. So, we designed Spatial Transformer Block and Directional Temporal Transformer Block for modeling skeleton sequences in spatial and temporal dimensions respectively. Due to occlusion/sensor/raw video, etc., there are noises on both temporal and spatial dimensions in the extracted skeleton data reducing the recognition capabilities of models. To adapt to this imperfect information condition, we propose a multi-task self-supervised learning method by providing confusing samples in different situations to improve the robustness of our model. Combining the above design, we propose our Spatial-Temporal Specialized Transformer~(STST) and conduct experiments with our model on the SHREC, NTU-RGB+D, and Kinetics-Skeleton. Extensive experimental results demonstrate the improved performances and analysis of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Learning shape and motion representations for view invariant skeleton-based action recognition
Yanshan Li ... Xing Liu
Pattern Recognition | VOL. 103
Yanshan Li, et. al.Yanshan Li ... Xing Liu
21 Feb 2020
Pattern Recognition | VOL. 103

Construction of Multiple Paths for the Living Protection and Utilization of Traditional Villages: A Case Study of the Zhoutie Traditional Village in the Taihu Lake Area
Jinxiu Wu ... Yu Bai
Journal of South Architecture | VOL. 1
Jinxiu Wu, et. al.Jinxiu Wu ... Yu Bai
18 Jun 2024
Journal of South Architecture | VOL. 1

Skeleton-based action recognition based on multidimensional adaptive dynamic temporal graph convolutional network
Yu Xia ... Yi Cao
Engineering Applications of Artificial Intelligence | VOL. 127
Yu Xia, et. al.Yu Xia ... Yi Cao
03 Oct 2023
Engineering Applications of Artificial Intelligence | VOL. 127

Spatiotemporal variation in the negative effect of neighbourhood crowding on stem growth
Hong‐Tu Zhang ... Zhiyao Tang
Journal of Ecology | VOL. 112
Hong‐Tu Zhang, et. al.Hong‐Tu Zhang ... Zhiyao Tang
18 Mar 2024
Journal of Ecology | VOL. 112

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition

Abstract

Talk to us

Similar Papers