Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

Wonjun Moon,Jae-Pil Heo,Hyun Seok Seong

doi:10.1609/aaai.v37i2.25284

Abstract

A dramatic increase in real-world video volume with extremely diverse and emerging topics naturally forms a long-tailed video distribution in terms of their categories, and it spotlights the need for Video Long-Tailed Recognition (VLTR). In this work, we summarize the challenges in VLTR and explore how to overcome them. The challenges are: (1) it is impractical to re-train the whole model for high-quality features, (2) acquiring frame-wise labels requires extensive cost, and (3) long-tailed data triggers biased training. Yet, most existing works for VLTR unavoidably utilize image-level features extracted from pretrained models which are task-irrelevant, and learn by video-level labels. Therefore, to deal with such (1) task-irrelevant features and (2) video-level labels, we introduce two complementary learnable feature aggregators. Learnable layers in each aggregator are to produce task-relevant representations, and each aggregator is to assemble the snippet-wise knowledge into a video representative. Then, we propose Minority-Oriented Vicinity Expansion (MOVE) that explicitly leverages the class frequency into approximating the vicinity distributions to alleviate (3) biased training. By combining these solutions, our approach achieves state-of-the-art results on large-scale VideoLT and synthetically induced Imbalanced-MiniKinetics200. With VideoLT features from ResNet-50, it attains 18% and 58% relative improvements on head and tail classes over the previous state-of-the-art method, respectively. Code and dataset are available at https://github.com/wjun0830/MOVE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Background-Click Supervision for Temporal Action Localization.
Le Yang ... Jianxin Chen
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44
Le Yang, et. al.Le Yang ... Jianxin Chen
01 Dec 2022
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44

A two-stage importance-aware subgraph convolutional network based on multi-source sensors for cross-domain fault diagnosis
Yue Yu ... Ahmet Enis Cetin
Neural Networks | VOL. 179
Yue Yu, et. al.Yue Yu ... Ahmet Enis Cetin
14 Jul 2024
Neural Networks | VOL. 179

Federated deep long-tailed learning: A survey
Kan Li ... Zhichao Ma
Neurocomputing | VOL. 595
Kan Li, et. al.Kan Li ... Zhichao Ma
22 May 2024
Neurocomputing | VOL. 595

Relieving the Incompatibility of Network Representation and Classification for Long-Tailed Data Distribution.
Hao Hu ... Mengya Gao
Computational Intelligence and Neuroscience | VOL. 2021
Hao Hu, et. al.Hao Hu ... Mengya Gao
01 Jan 2020
Computational Intelligence and Neuroscience | VOL. 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence