Spatiotemporal Learning Transformer for Video-Based Human Pose Estimation

Di Gai,Weidong Min,Runyang Feng,Xiaosong Yang,Pengxiang Su,Qi Wang,Qing Han

doi:10.1109/tcsvt.2023.3269666

Abstract

Multi-frame human pose estimation has long been an appealing and fundamental issue in visual perception. Owing to the frequent rapid motion and pose occlusion in videos, this task is extremely challenging. Current state-of-the-art methods seek to model spatiotemporal features by equally fusing each frame in the local sequence, which weakens the target frame information. In addition, existing approaches usually emphasize more on deep features while ignoring the detailed information implied in the shallow feature maps, resulting in the dropping of crucial features. To address the above problems, we propose an effective framework, namely spatiotemporal learning transformer for video-based human pose estimation (SLT-Pose), which consists of a Personalized Feature Extraction Module (PFEM), Self-feature Refinement Module (SRM), Cross-frame Temporal Learning Module (CTLM) and Disentangled Keypoint Detector (DKD). To be specific, we propose PFEM which extracts and modulates the individual frame features to adapt to the varying human shape, and integrates single-frame features to obtain the spatiotemporal features. We further present SRM to establish global correlation spatial cues on the target frame to attain the refinement feature. Then, a CTLM is designed to search for the information most closely related to the target frame from the spatiotemporal features to intensify the interaction between the target frame and the local sequence, using both the shallow detailed and the deep semantic representations. Finally, we employ DKD to extract the disentangled characteristics of each joint and encode the articulated joint pairs in the human body, promoting the model to reasonably and accurately predict the keypoint heatmaps. Extensive experiments on three huamn motion benchmarks, including PoseTrack2017, PoseTrack2018, and Sub-JHMDB dataset, demonstrate that SLT-Pose plays favorably against state-of-the-art approaches in terms of both objective evaluation and subjective visual performance.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spatiotemporal Learning Transformer for Video-Based Human Pose Estimation

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society

Lead the way for us

Journal: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society	Publication Date: Sep 1, 2023
Citations: 4

Similar Papers

Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video.
Weiwei Li ... Shudong Chen
Sensors (Basel, Switzerland) | VOL. 22
Weiwei Li, et. al.Weiwei Li ... Shudong Chen
28 Mar 2022
Sensors (Basel, Switzerland) | VOL. 22

AdvIT: Adversarial Frames Identifier Based on Temporal Consistency in Videos
Chaowei Xiao ... Ruizhi Deng
-
Chaowei Xiao, et. al.Chaowei Xiao ... Ruizhi Deng
01 Oct 2019
01 Oct 2019

Boosting Monocular 3D Human Pose Estimation With Part Aware Attention.
Youze Xue ... Hongbing Ma
IEEE Transactions on Image Processing | VOL. 31
Youze Xue, et. al.Youze Xue ... Hongbing Ma
01 Jan 2021
IEEE Transactions on Image Processing | VOL. 31

A self-supervised spatio-temporal attention network for video-based 3D infant pose estimation
Wang Yin ... Yuan Zhou
Medical image analysis | VOL. 96
Wang Yin, et. al.Wang Yin ... Yuan Zhou
18 May 2024
Medical image analysis | VOL. 96

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatiotemporal Learning Transformer for Video-Based Human Pose Estimation

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society