TSwinPose: Enhanced monocular 3D human pose estimation with JointFlow

Muyu Li,Henan Hu,Jingjing Xiong,Xudong Zhao,Hong Yan

doi:10.1016/j.eswa.2024.123545

Abstract

Monocular estimation of 3D human poses is challenging due to ambiguity in depths and partial occlusion. Most recent works define this as a 2D-to-3D lifting task, taking 2D key point sequences and using spatial and temporal relationships. However, prior works focus on capturing spatio-temporal correlations but ignore the motion of joints that is needed for continuous estimation. To extend the potential of 2D-to-3D pose estimation, we propose TSwinPose, which learns multi-scale spatio-temporal representations from 2D key point locations and patterns of motion. The input 2D key point sequences are enhanced by JointFlow, which encodes the motion of each human joint. Based on Swin-Transformer, we designed a temporal domain Swin-Unet structure to model multi-scale spatio-temporal relationships of human joints across different temporal windows. The final 3D pose generated by multi-stage representations is consistent temporally and has a higher accuracy. Experiments conducted on three benchmark datasets, Human3.6M, MPI-INF-3DHP, and HumanEva-I, demonstrate that TSwinPose achieves performance that is on par with state-of-the-art methods. Moreover, the introduction of JointFlow as a plug-in extension enhances performance significantly, particularly benefiting long-term 2D-to-3D lifting human pose estimation methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TSwinPose: Enhanced monocular 3D human pose estimation with JointFlow

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Feb 27, 2024
Citations: 1

Similar Papers

Adapted human pose: monocular 3D human pose estimation with zero real 3D pose data
Shuangjun Liu ... Naveen Sehgal
Applied Intelligence | VOL. 52
Shuangjun Liu, et. al.Shuangjun Liu ... Naveen Sehgal
08 Mar 2022
Applied Intelligence | VOL. 52

Multi-View Pose Generator Based on Deep Learning for Monocular 3D Human Pose Estimation
Jun Sun ... Dejun Zhang
Symmetry | VOL. 12
Jun Sun, et. al.Jun Sun ... Dejun Zhang
04 Jul 2020
Symmetry | VOL. 12

Dual Networks Based 3D Multi-Person Pose Estimation From Monocular Video.
Yu Cheng ... Robby T Tan
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45
Yu Cheng, et. al.Yu Cheng ... Robby T Tan
01 Feb 2023
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 45

A Multi-Task Neural Network for Action Recognition with 3D Key-Points
Rongxiao Tang ... Luyang Wang
-
Rongxiao Tang, et. al.Rongxiao Tang ... Luyang Wang
10 Jan 2021
10 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TSwinPose: Enhanced monocular 3D human pose estimation with JointFlow

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications