Abstract

In the past few years, scene contextual information has been increasingly used for action understanding with promising results. However, unsupervised video action clustering using context has been less explored, and existing clustering methods cannot achieve satisfactory performances. In this paper, we propose a novel unsupervised video action clustering method by using the motion-scene interaction constraint (MSIC). The proposed method takes the unique static scene and dynamic motion characteristics of video action into account, and develops a contextual interaction constraint model under a self-representation subspace clustering framework. First, the complementarity of multi-view subspace representation in each context is explored by single-view and multi-view constraints. Afterward, the context-constrained affinity matrix is calculated and the MSIC is introduced to mutually regularize the disagreement of subspace representation in scene and motion. Finally, by jointly constraining the complementarity of multi-views and the consistency of multi-contexts, an overall objective function is constructed to guarantee the video action clustering result. The experiments on four video benchmark datasets (Weizmann, KTH, UCFsports, and Olympic) demonstrate that the proposed method outperforms the state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.