AbstractThe emergence of learning‐based motion in‐betweening techniques offers animators a more efficient way to animate characters. However, existing non‐generative methods either struggle to support long transition generation or produce results that lack diversity. Meanwhile, diffusion models have shown promising results in synthesizing diverse and high‐quality motions driven by text and keyframes. However, in these methods, keyframes often serve as a guide rather than a strict constraint and can sometimes be ignored when keyframes are sparse. To address these issues, we propose a lightweight yet effective diffusion‐based motion in‐betweening framework that generates animations conforming to keyframe constraints. We incorporate keyframe constraints into the training phase to enhance robustness in handling various constraint densities. Moreover, we employ relative positional encoding to improve the model's generalization on long range in‐betweening tasks. This approach enables the model to learn from short animations while generating realistic in‐betweening motions spanning thousands of frames. We conduct extensive experiments to validate our framework using the newly proposed metrics K‐FID, K‐Diversity, and K‐Error, designed to evaluate generative in‐betweening methods. Results demonstrate that our method outperforms existing diffusion‐based methods across various lengths and keyframe densities. We also show that our method can be applied to text‐driven motion synthesis, offering fine‐grained control over the generated results.
Read full abstract