Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet

Shihao Zou,Li Cheng,Lingni Ma,Chao Li,Yuanlu Xu,Minh Vo

doi:10.1109/tcsvt.2023.3244152

Abstract

Multi-person pose understanding from RGB videos involves three complex tasks: pose estimation, tracking and motion forecasting. Intuitively, accurate multi-person pose estimation facilitates robust tracking, and robust tracking builds crucial history for correct motion forecasting. Most existing works either focus on a single task or employ multi-stage approaches to solving multiple tasks separately, which tends to make sub-optimal decision at each stage and also fail to exploit correlations among the three tasks. In this paper, we propose Snipper, a unified framework to perform multi-person 3D pose estimation, tracking, and motion forecasting simultaneously in a single stage. We propose an efficient yet powerful deformable attention mechanism to aggregate spatiotemporal information from the video snippet. Building upon this deformable attention, a video transformer is learned to encode the spatiotemporal features from the multi-frame snippet and to decode informative pose features for multi-person pose queries. Finally, these pose queries are regressed to predict multi-person pose trajectories and future motions in a single shot. In the experiments, we show the effectiveness of Snipper on three challenging public datasets where our generic model rivals specialized state-of-art baselines for pose estimation, tracking, and forecasting. Code is available at https://github.com/JimmyZou/Snipper.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society

Lead the way for us

Journal: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society	Publication Date: Sep 1, 2023
Citations: 3

Similar Papers

Argoverse: 3D Tracking and Forecasting With Rich Maps
Ming-Fang Chang ... Jagjeet Singh
-
Ming-Fang Chang, et. al.Ming-Fang Chang ... Jagjeet Singh
01 Jun 2019
01 Jun 2019

Pose Estimation for Ground Robots: On Manifold Representation, Integration, Reparameterization, and Optimization
Mingming Zhang ... Yong Liu
IEEE Transactions on Robotics | VOL. 37
Mingming Zhang, et. al.Mingming Zhang ... Yong Liu
03 Jan 2021
IEEE Transactions on Robotics | VOL. 37

DetPoseNet: Improving Multi-Person Pose Estimation via Coarse-Pose Filtering.
Lipeng Ke ... Honggang Qi
IEEE Transactions on Image Processing | VOL. 31
Lipeng Ke, et. al.Lipeng Ke ... Honggang Qi
01 Jan 2021
IEEE Transactions on Image Processing | VOL. 31

Robustify Hand Tracking by Fusing Generative and Discriminative Methods
Le Thanh Ha ... Nguyen Viet Anh
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 37
Le Thanh Ha, et. al.Le Thanh Ha ... Nguyen Viet Anh
17 Feb 2021
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society