Velocity-to-velocity human motion forecasting

Hongsong Wang,Liang Wang,Jiashi Feng,Daquan Zhou

doi:10.1016/j.patcog.2021.108424

Abstract

Forecasting human motion from a sequence of human poses is an important problem in the fields of computer vision and robotics. Most previous approaches merely consider learning the temporal dynamics of body joints or joint angles, while neglect derivatives of body joints (i.e., pose velocities) which could reasonably reduce noise impact and improve stability. To exploit the benefits of pose velocities, we propose the velocity-to-velocity learning paradigm for human motion prediction which attempts to directly build the sequence-to-sequence model in the velocity space. Two variant architectures based on recurrent encoder-decoder networks are introduced under this paradigm. Considering human motion as kinematics of rigid bodies, joint angles which denote transformation are the computations of inverse kinematics. Accordingly, a novel loss function in terms of rotation matrices is designed during training for human motion prediction through a rotation matrix transformation (RMT) layer. Finally, we present an effective training algorithm which exploits sequence transformation to improve model generalization. Our approaches substantially outperform state-of-the-art approaches on two large-scale datasets, Human3.6M and CMU Motion Capture, for both short-term prediction and long-term prediction. In particular, our model can competently forecast human-like and meaningful poses up to 1000 milliseconds. The code is available on GitHub: https://github.com/hongsong-wang/RNN_based_human_motion_prediction.

Full Text