Abstract

Temporal-related samples always have huge intra-class appearance variation, on which lots of existing action recognition algorithms have poor performance. In this paper, our motivation is to address this issue by utilizing temporal information more effectively. A novel light-weight Voting-based Temporal Correlation module (VTC) is proposed to enhance temporal cues. VTC integrates sparse temporal sampling strategy into feature sequences, so it mitigates the effect of redundant information and focuses more on temporal modeling. Furthermore, we propose a simple and intuitive Similarity Loss (SL) to guide the training procedure for VTC. Introducing confusion in the predicted vector intentionally, SL eases intra-class variation by discovering class-specific common motion pattern rather than sample-specific discriminative information. Combining VTC and SL with complementary advances in this field, we clearly outperform state-of-the-art results on HMDB51, UCF101, and Something-something-v1 dataset. The code has been made publicly available on https://github.com/FingerRec/TRS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call