ConvTransformer Attention Network for temporal action detection

Di Cui,Chang Xin,Lifang Wu,Xiangdong Wang

doi:10.1016/j.knosys.2024.112264

Abstract

Boundary detection is a challenging problem in Temporal Action Detection (TAD). While transformer-based methods achieve satisfactory results by incorporating self-attention mechanisms to model global dependencies for boundary detection, they face two key issues. Firstly, they lack explicit learning of local relationships; this limitation results in imprecise boundary detection when subtle appearance changes occur between adjacent clips. Secondly, transformer-based methods lead to feature convergence across multiple actions due to the self-attention mechanism’s tendency to distribute focus across the entire input video, resulting in the prediction of imprecisely overlapping actions. To address these challenges, we introduce the ConvTransformer Attention Network (CTAN), a novel framework comprised of two primary components: (1) The Temporal Attention Block (TAB), a temporal attention mechanism designed to emphasize critical temporal positions enriched with essential action-related features. (2) The ConvTransformer Block (CTB), which employs a hybrid structure for capturing nuanced appearance changes locally and action transitions globally. Facilitated with these components, CTAN is adept at focusing on motion features between overlapping actions, and precisely capturing both local differences between adjacent clips and global action transitions. The extensive experiments on multiple datasets, including THUMOS14, MultiTHUMOS, and ActivityNet, confirm the effectiveness of CTAN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ConvTransformer Attention Network for temporal action detection

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Similar Papers

Dynamic Linear Transformer for 3D Biomedical Image Segmentation.
Zheyuan Zhang ... Ulas Bagci
Machine learning in medical imaging. MLMI (Workshop) | VOL. 13583
Zheyuan Zhang, et. al.Zheyuan Zhang ... Ulas Bagci
01 Jan 2021
Machine learning in medical imaging. MLMI (Workshop) | VOL. 13583

Boundary Information Matters More: Accurate Temporal Action Detection with Temporal Boundary Network
Tao Zhang ... Thomas Li
-
Tao Zhang, et. al.Tao Zhang ... Thomas Li
01 May 2019
01 May 2019

Two-Stream Region Convolutional 3D Network for Temporal Activity Detection.
Huijuan Xu ... Kate Saenko
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 41
Huijuan Xu, et. al.Huijuan Xu ... Kate Saenko
07 Jun 2019
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 41

Motion Feature Aggregation for Video-Based Person Re-Identification.
Xinqian Gu ... Bingpeng Ma
IEEE Transactions on Image Processing | VOL. 31
Xinqian Gu, et. al.Xinqian Gu ... Bingpeng Ma
01 Jan 2021
IEEE Transactions on Image Processing | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ConvTransformer Attention Network for temporal action detection

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems