Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection.

Jing Tan,Yuhong Wang,Gangshan Wu,Limin Wang

doi:10.1109/tpami.2023.3283067

Abstract

Generic Boundary Detection (GBD) aims at locating the general boundaries that divide videos into semantically coherent and taxonomy-free units, and could serve as an important pre-processing step for long-form video understanding. Previous works often separately handle these different types of generic boundaries with specific designs of deep networks from simple CNN to LSTM. Instead, in this paper, we present Temporal Perceiver, a general architecture with Transformer, offering a unified solution to the detection of arbitrary generic boundaries, ranging from shot-level, event-level, to scene-level GBDs. The core design is to introduce a small set of latent feature queries as anchors to compress the redundant video input into a fixed dimension via cross-attention blocks. Thanks to this fixed number of latent units, it greatly reduces the quadratic complexity of attention operation to a linear form of input frames. Specifically, to explicitly leverage the temporal structure of videos, we construct two types of latent feature queries: boundary queries and context queries, which handle the semantic incoherence and coherence accordingly. Moreover, to guide the learning of latent feature queries, we propose an alignment loss on the cross-attention maps to explicitly encourage the boundary queries to attend on the top boundary candidates. Finally, we present a sparse detection head on the compressed representation, and directly output the final boundary detection results without any post-processing module. We test our Temporal Perceiver on a variety of GBD benchmarks. Our method obtains the state-of-the-art results on all benchmarks with RGB single-stream features: SoccerNet-v2(81.9 percent average-mAP), Kinetics-GEBD(86.0 percent average-f1), TAPOS(73.2 percent average-f1), MovieScenes(51.9 percent AP and 53.1 percent Miou) and MovieNet(53.3 percent AP and 53.2 percent Miou), demonstrating the generalization ability of our Temporal Perceiver. To further pursue a general GBD model, we combined various tasks to train a class-agnostic Temporal perceiver and evaluate its performance across all benchmarks. Results show that the class-agnostic Perceiver achieves comparable detection accuracy and even better generalization ability compared to dataset-specific Temporal Perceiver.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Jan 1, 2023
Citations: 8

Similar Papers

Generalized Boundaries from Multiple Image Interpretations.
Marius Leordeanu ... Cristian Sminchisescu
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 36
Marius Leordeanu, et. al.Marius Leordeanu ... Cristian Sminchisescu
01 Jul 2014
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 36

Improving keyword based web image search with visual feature distribution and term expansion
Zhiguo Gong ... Qian Liu
Knowledge and Information Systems | VOL. 21
Zhiguo Gong, et. al.Zhiguo Gong ... Qian Liu
12 Dec 2008
Knowledge and Information Systems | VOL. 21

One-dimensional run-and-tumble motions with generic boundary conditions
Luca Angelani
Journal of Physics A: Mathematical and Theoretical | VOL. 56
Luca AngelaniLuca Angelani
16 Oct 2023
Journal of Physics A: Mathematical and Theoretical | VOL. 56

Motion Aware Self-Supervision for Generic Event Boundary Detection
Ayush K Rai ... Julia Dietlmeier
-
Ayush K Rai, et. al.Ayush K Rai ... Julia Dietlmeier
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence