DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks.

Zhiheng Li,Yujie Zhong,Ran Song,Tianjiao Li,Lin Ma,Wei Zhang

doi:10.1109/tpami.2024.3395778

Abstract

Pre-trained visual-language (ViL) models have demonstrated good zero-shot capability in video understanding tasks, where they were usually adapted through fine-tuning or temporal modeling. However, in the task of open-vocabulary temporal action localization (OV-TAL), such adaption reduces the robustness of ViL models against different data distributions, leading to a misalignment between visual representations and text descriptions of unseen action categories. As a result, existing methods often strike a trade-off between action detection and classification. Aiming at this issue, this paper proposes DeTAL, a simple but effective two-stage approach for OV-TAL. DeTAL decouples action detection from action classification to avoid the compromise between them, and the state-of-the-art methods for close-set action localization can be handily adapted to OV-TAL, which significantly improves the performance. Meanwhile, DeTAL can easily tackle the scenario where action category annotations are unavailable in the training dataset. In the experiments, we propose a new cross-dataset setting to evaluate the zero-shot capability of different methods. And the results demonstrate that DeTAL outperforms the state-of-the-art methods for OV-TAL on both THUMOS14 and ActivityNet1.3.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence

Lead the way for us

Journal: IEEE transactions on pattern analysis and machine intelligence	Publication Date: Dec 1, 2024
Citations: 1

Similar Papers

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization
Yi Liu ... Yu Qiao
IEEE Transactions on Image Processing | VOL. 31
Yi Liu, et. al.Yi Liu ... Yu Qiao
01 Jan 2021
IEEE Transactions on Image Processing | VOL. 31

Action Completeness Modeling with Background Aware Networks for Weakly-Supervised Temporal Action Localization
Md Moniruzzaman ... Ming C Leu
-
Md Moniruzzaman, et. al.Md Moniruzzaman ... Ming C Leu
12 Oct 2020
12 Oct 2020

Enriching Local and Global Contexts for Temporal Action Localization
Zixin Zhu ... Le Wang
-
Zixin Zhu, et. al.Zixin Zhu ... Le Wang
01 Oct 2021
01 Oct 2021

A Revisit of Action Detection Using Improved Trajectories
Konstantinos Papadopoulos ... Bjorn Ottersten
-
Konstantinos Papadopoulos, et. al.Konstantinos Papadopoulos ... Bjorn Ottersten
01 Apr 2018
01 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence