Learning Semantic-Aware Spatial-Temporal Attention for Interpretable Action Recognition

Jie Fu,Changsheng Xu,Junyu Gao

doi:10.1109/tcsvt.2021.3137023

Abstract

Human beings can concentrate on the most semantically relevant visual information when performing action recognition, so as to make reasonable and interpretable predictions. However, most existing approaches, which are applied to address visual tasks, neglect to explicitly imitate such ability for improving the performance and reliability of models. In this paper, we propose an interpretable action recognition framework that can not only improve the performance but also enhance the visual interpretability of 3D CNNs. Specifically, we design a semantic-aware attention module to learn correlative spatial-temporal attention for different action categories. To further leverage the rich semantics of features extracted from different layers, we design a hierarchical semantic fusion module with the help of the learned attention. The proposed two modules can enhance and complement each other, meanwhile, the semantic-aware attention module enjoys the plug-and-play merit. We evaluate our method on different benchmarks with comprehensive ablation studies and visualization analysis. Experimental results demonstrate the effectiveness of our method, showing favorable accuracy against state-of-the-arts while enhancing the semantic interpretability (Code will be available at this link <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/PHDJieFu</uri> ).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Semantic-Aware Spatial-Temporal Attention for Interpretable Action Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Aug 1, 2022
Citations: 15

Similar Papers

ODN: Opening the Deep Network for Open-Set Action Recognition
Yu Shu ... Yonghong Tian
-
Yu Shu, et. al.Yu Shu ... Yonghong Tian
01 Jul 2018
01 Jul 2018

View-Robust Neural Networks for Unseen Human Action Recognition in Videos
Jiahui Yu ... Hang Chen
-
Jiahui Yu, et. al.Jiahui Yu ... Hang Chen
09 Oct 2022
09 Oct 2022

Spatiotemporal Saliency Based Multi-stream Networks for Action Recognition
Zhenbing Liu ... Ruili Wang
-
Zhenbing Liu, et. al.Zhenbing Liu ... Ruili Wang
01 Jan 2020
01 Jan 2020

Efficient Feature Extraction, Encoding, and Classification for Action Recognition
Vadim Kantorov ... Ivan Laptev
-
Vadim Kantorov, et. al.Vadim Kantorov ... Ivan Laptev
01 Jun 2014
01 Jun 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Semantic-Aware Spatial-Temporal Attention for Interpretable Action Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology