Unsupervised Video Action Clustering via Motion-Scene Interaction Constraint

Bo Peng,Changqing Zhang,Tat-Seng Chua,Huazhu Fu,Jianjun Lei,Xuelong Li

doi:10.1109/tcsvt.2018.2889514

Abstract

In the past few years, scene contextual information has been increasingly used for action understanding with promising results. However, unsupervised video action clustering using context has been less explored, and existing clustering methods cannot achieve satisfactory performances. In this paper, we propose a novel unsupervised video action clustering method by using the motion-scene interaction constraint (MSIC). The proposed method takes the unique static scene and dynamic motion characteristics of video action into account, and develops a contextual interaction constraint model under a self-representation subspace clustering framework. First, the complementarity of multi-view subspace representation in each context is explored by single-view and multi-view constraints. Afterward, the context-constrained affinity matrix is calculated and the MSIC is introduced to mutually regularize the disagreement of subspace representation in scene and motion. Finally, by jointly constraining the complementarity of multi-views and the consistency of multi-contexts, an overall objective function is constructed to guarantee the video action clustering result. The experiments on four video benchmark datasets (Weizmann, KTH, UCFsports, and Olympic) demonstrate that the proposed method outperforms the state-of-the-art methods.

Full Text