The rapid, accurate, and robust computation of virtual human figures' "in-between" pose transitions from available and sometimes sparse inputs is of fundamental significance to 3D interactive graphics and computer animation. Various methods have been proposed to produce natural lifelike transitions of human pose automatically in recent decades. Nevertheless, conventional pure model-driven methods require heuristic knowledge (e.g., least motion guided by physics laws) and ad-hoc clues (e.g., splines with non-uniform time warp) that are difficult to obtain, learn, and infer. With the fast emergence of large-scale datasets readily available to animators in the most recent years, deep models afford a powerful alternative to tackle the aforementioned challenges. However, pure data-driven methods still suffer from the remaining challenges such as unseen data in practice and less generative power in model/domain/data transfer, and the measurement of the generative power has always been omitted in these works. In essence, data-driven methods solely rely on the qualities and quantities of training datasets. In this paper, we propose a hybrid approach built upon the seamless integration of data-driven and model-driven methods, called Dynamic Motion Transition (DMT), with the following salient modeling advantages: (1) The data augmentation capability based on the limited human locomotion data capture and the concept of force-derived directly from physical laws; (2) Force learning by which skeleton joints are driven to move, and the Conditional Temporal Transformer (CTT) being trained to learn the force change in the local range, both at the fine level; and (3) At the coarse level, the effective and flexible creation of the subsequent step motion using Dynamic Movement Primitives (DMP) until the target is reached. Our extensive experiments have confirmed that our model can outperform the state-of-the-art methods under the newly devised metric by virtue of the least action loss function. In addition, our novel method and system are of immediate benefit to many other animation tasks such as motion synthesis and control, and motion tracking and prediction in this bigdata graphics era.
Read full abstract