Abstract
Human motion segmentation (HMS) aims to segment a long human action video into a bunch of short and meaningful action clips. Existing supervised learning approaches need a large amount of training data which may be costly in real-world scenario, while most unsupervised clustering methods cannot fully explore the temporal correlations among human motions and hard to achieve promising performances. In our paper, we design a novel unsupervised framework, called Velocity-Sensitive Dual-Side Auto-Encoder (VSDA), for HMS task. Specifically, a multi-neighbor auto-encoder (MNA) is proposed to extract informative temporal features, which fully explores the local temporal patterns of human motions. In addition, a long-short distance encoding (LSE) strategy is designed. It constrains the encoded representations of close (short-distance) frames becoming similar while the representations of far-away (long-distance) frames becoming distinctive. Similarly, this strategy is also deployed on the decoded outputs as the long-short distance decoding (LSD) module. The LSE/LSD guides the learning process explicitly and implicitly to achieve the dual-side structure. Moreover, we consider the energy variations during the human motion to propose the velocity-sensitive (VS) guidance mechanism for further model improvement. VSDA leverages the temporal characteristics of human motion and derives promising HMS performance. Comprehensive experiments on six real-world human motion datasets illustrate the effectiveness of our proposed model.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have