Abstract

This paper focuses on skeleton-based few-shot action recognition. Since skeleton is essentially a sparse representation of human action, the feature maps extracted from it, through a standard encoder network in the few-shot condition, may not be sufficiently discriminative for some action sequences that look partially similar to each other. To address this issue, we propose a self and mutual adaptive matching (SMAM) module to convert such feature maps into more discriminative feature vectors. Our method, named as SMAM-Net, first leverages both the temporal information associated with each individual skeleton joint and the spatial relationship among them for feature extraction. Then, the SMAM module adaptively measures the similarity between labeled and query samples and further carries out feature matching within the query set to distinguish similar skeletons of various action categories. Experimental results show that the SMAM-Net outperforms other baselines on the large-scale NTU RGB+D 120 dataset in the tasks of one-shot and five-shot action recognition. We also report our results on smaller datasets including NTU RGB+D 60, SYSU and PKU-MMD to demonstrate that our method is reliable and generalises well on different datasets. Codes and the pretrained SMAM-Net will be made publicly available.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.