Abstract

In this paper, we propose a novel method for dynamic label assignment in temporal action detection (TAD) called Optimal Action Segment Assignment (OASA). The OASA method converts label assignment into an optimal transportation problem by computing the cost matrix between predicted temporal action segments and groundtruths. The unit transportation cost between a predicted temporal segment and a groundtruth pair is defined as the weighted summation of action classification loss and temporal localization loss. Additionally, we deploy Adaptive Estimation of Candidate Segment Number (AE-CSN) to adaptively determine the number of positive samples for each groundtruth. After formulation, the label assignment problem is converted to find a global optimal assignment plan by minimizing the cost. Therefore, OASA eliminates the need for manually designed prior parameters, which exist in fixed label assignment methods, and improves the generalization of the algorithm between different datasets. To evaluate OASA, we also introduce a simple anchor-free temporal action detector called ActionMixer. It consists of two components: Temporal Mixer and Channel Mixer. The Temporal Mixer employs depth-wise convolution layers with large kernels to capture temporal information, while the Channel Mixer mixes and extracts features across the channel dimension. Extensive experiments conducted on the THUMOS-14, ActivityNet-1.3, and EPIC-Kitchens-100 datasets show that ActionMixer equipped with OASA achieves state-of-the-art performance, surpassing other advanced temporal action detection methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call