Abstract
Human motion prediction, the task of forecasting future poses from a given observed sequence, is crucial for advancing human-centric computer vision. Significant progress has been made by employing fixed or adaptively learned spatiotemporal Graph Neural Networks (GNNs) for skeleton sequences. However, many existing GNN-based methods often yield counterfactual predictions by overlooking the inherent correlations between body parts essential for balance and coordination. In this study, we propose an end-to-end Spatio-Temporal Articulation & Coordination Co-attention Graph Network (ST-A&CCGN) that leverages both the physical and kinematic regularities of the human motion system. Specifically, we construct articulated and coordination graphs as prior knowledge to explicitly model the connectivities among articulated body parts and their coordination. Subsequently, we design a consistent module employing a parameter-sharing strategy to extract embeddings consistently representing both physical articulation and dynamic coordination. An attention mechanism is introduced to simultaneously learn adaptive weights for articulated, coordinated, and consistent embeddings. Finally, we utilize a multi-head temporal encoding module to capture temporal dependencies among spatial features. Extensive qualitative and quantitative evaluations conducted on three widely used benchmarks, namely Human3.6M, AMASS, and 3DPW, validate the effectiveness of the proposed framework. Additionally, ablation studies are conducted to emphasize the importance of integrating various modules. Remarkably, our model surpasses state-of-the-art (SOTA) competitors.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have