Abstract

In this paper, we propose a novel framework for unsupervised representation learning using a structure-asymmetrical auto-encoder in which a 2D-CNN-based encoder learns separable spatiotemporal representations in a low-dimensional feature space under the supervision of salient skeleton motion cues. This study addresses the problem of learning action representations of skeleton sequences. The network captures not only correlations of adjacent joints but also long-term motion dependencies by using the proposed unsupervised training, which leads to the advantage that similar movements are gathered around the same cluster, whereas different movements are gathered around distinct clusters. Our method is unsupervised and does not rely on annotations to associate skeleton sequences with actions. Experimental results clearly showed the effectiveness of the proposed representation learning, and improvements compared with skeleton-based generative learning methods. When the proposed network was fine-tuned with partial labeled data, our results still outperformed some fully supervised methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.