Abstract

Most current few-shot action recognition methods model temporal relationships on the basis of image classification and achieve satisfactory results. However, they focus on the extra temporal information of video data compared to images and use the frame tuple embedding representation of the query video for matching, but ignore the important information of “action changing feature” in action recognition. To use this information, we propose the Temporal Relational CrossTransformers Based on Image Difference Pyramid (TRX-IDP) method for few-shot action recognition. Based on TRX, we perform high-order image difference, sigmoid enhancement, resizing on the frame tuples which are directly used for query, and use the frame tuples to calculate the Motion History Image (MHI). Combined with the two, we construct the Image Difference Pyramid containing motion feature information. We also develop CrossTransformers query representation for IDP and restructure the linear mapping function of the model. We evaluate our model using four commonly used few-shot action recognition benchmark datasets. TRX-IDP achieves state-of-the-art performance on partial SSv2, HMDB51, and UCF101, while slightly lagging behind the current best models on Kinetics and SSv2. In addition, we perform detailed ablation experiments on TRX-IDP to prove the importance of each part of the model and to give the best hyperparameters of TRX-IDP.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.