Dear-Net: Learning Diversities for Skeleton-Based Early Action Recognition

Rui Wang,Yinjie Lei,Jun Liu,Duo Peng,Qiuhong Ke

doi:10.1109/tmm.2021.3139768

Abstract

Early actionrecognition, i.e., recognizing an action before it is fully performed, is a challenging and important task. Existing works mainly focus on deterministic early action recognition outputting only a single class, and ignore the uncertainty and diversity that essentially exist in this task. Intuitively, when only the early portion of the action is observed, there could be multiple possibilities of the full action, as diversified actions can share almost identical early segments in many scenarios. Thus taking uncertainties and diversities into account, and outputting multiple plausible predictions, instead of a single one, can be important for the sake of authenticity and requirement of many practical applications. To this end, we propose a novel Diversified Early Action Recognition Network (Dear-Net) that is capable of outputting multiple reasonable action classes for each partial sequence by utilizing mode conversion. Specifically, we introduce an effective action diversity learning strategy to drive our network towards predicting diverse and reasonable results, in which each learnable action class is matched with the most suitable mode. Meanwhile, the collapsed modes which fail to receive any action class, are also considered in this strategy in order to ensure diversity. Moreover, we design a sequence decoder within our network to capture latent global information for better early action recognition. It provides a feasible scheme for weakly-supervised setting in which the Dear-Net leverages unlabelled data to improve performance. Experimental results on three challenging datasets clearly show the effectiveness of our approach.

Full Text