Abstract

Early activity prediction/recognition aims to recognize action categories before they are fully conveyed. Compared to full-length action sequences, partial video sequences only provide insufficient discrimination information, which makes predicting the class labels for some similar activities challenging, especially when only very few frames can be observed. To address this challenge, in this paper, we propose a novel meta negative network, namely, Magi-Net, that utilizes a contrastive learning scheme to alleviate the insufficiency of discriminative information. In our Magi-Net model, the positive samples are generated by augmenting an input anchor conditioned on all observation ratios, while the negative samples are selected from a trainable negative look-up memory (LUM) table, which stores the training samples and the corresponding misleading categories. Furthermore, a meta negative sample optimization strategy (MetaSOS) is proposed to boost the training of Magi-Net by encouraging the model to learn from the most informative negative samples via a meta learning scheme. Extensive experiments are conducted on several public skeleton-based activity datasets, and the results show the efficacy of the proposed Magi-Net model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.