Weakly Supervised Temporal Action Detection With Temporal Dependency Learning

Bairong Li,Ruixin Liu,Tianquan Chen,Yuesheng Zhu

doi:10.1109/tcsvt.2021.3125701

Abstract

Weakly supervised temporal action detection aims at localizing temporal positions of action instances in untrimmed videos with only action class labels. In general, previous methods individually classify each frame based on the appearance information and the short-term motion information, and then integrate consecutive high-response action frames into entities which serve as detected action instances. However, the long-range temporal dependencies between action frames are not fully utilized, and the detection results are more likely to be trapped in the most discriminative action segments. To alleviate this issue, we propose a novel two-branch (i.e., the coarse detection branch and the refining detection branch) detection framework with learning the long-range temporal dependencies for obtaining more accurate detection results, where only action class labels are required. The coarse detection branch is used to localize the most discriminative segments of action instances based on a typical multi-instance learning paradigm under the supervision of action class labels, whereas the refining detection branch is expected to localize the less discriminative segments of action instances via learning the long-range temporal dependencies between frames based on the proposed Transformer-style architecture and learning strategies. This collaboration mechanism takes full advantage of complementary information from the provided action class labels and the natural temporal dependencies between action frames, forming a more comprehensive solution. Consequently, our method obtains more precise detection results. Expectedly, the proposed method outperforms recent weakly supervised temporal action detection methods on dataset THUMOS14 and ActivityNet measured by mAP@tIoU and AR@AN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Weakly Supervised Temporal Action Detection With Temporal Dependency Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Jul 1, 2022
Citations: 6

Similar Papers

B2-ViT Net: Broad Vision Transformer Network With Broad Attention for Seizure Prediction.
Shuiling Shi ... Wenqi Liu
IEEE Transactions on Neural Systems and Rehabilitation Engineering | VOL. 32
Shuiling Shi, et. al.Shuiling Shi ... Wenqi Liu
01 Jan 2024
IEEE Transactions on Neural Systems and Rehabilitation Engineering | VOL. 32

Learning Sequence Representations by Non-local Recurrent Neural Memory
Wenjie Pei ... Qiong Cao
International Journal of Computer Vision | VOL. 130
Wenjie Pei, et. al.Wenjie Pei ... Qiong Cao
14 Aug 2022
International Journal of Computer Vision | VOL. 130

FMRI-S4: Learning Short- and Long-Range Dynamic fMRI Dependencies Using 1D Convolutions and State Space Models
Ahmed El-Gazzar ... Guido Van Wingen
-
Ahmed El-Gazzar, et. al.Ahmed El-Gazzar ... Guido Van Wingen
01 Jan 2021
01 Jan 2021

A Transformer Model for Ionospheric TEC Prediction Using GNSS Observations
Maria Kaselimi ... Demitris Delikaraoglou
-
Maria Kaselimi, et. al.Maria Kaselimi ... Demitris Delikaraoglou
15 May 2023
15 May 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Weakly Supervised Temporal Action Detection With Temporal Dependency Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology