Spatial-Temporal Exclusive Capsule Network for Open Set Action Recognition

Yangbo Feng,Junyu Gao,Changsheng Xu,Shicai Yang

doi:10.1109/tmm.2023.3252275

Abstract

Open set action recognition (OSAR) is a rising research domain that simultaneously identifies all videos from known classes and rejects videos from unknown classes. Existing methods rarely consider the open set data distribution and the spatial-temporal relations of video subsequence. Recently proposed Capsule Network (CapsNet) has shown robust performance in many fields, especially image recognition. However, the current CapsNet has not been directly applied to the OSAR task since it cannot explicitly consider the data distribution of known and unknown classes along with the spatial-temporal relations for videos. This paper proposes the Spatial-Temporal Exclusive Capsule Network (STE-CapsNet) to solve the problems in the OSAR task. The STE-CapsNet designs the temporal-spatial routing mechanism to jointly capture the spatial-temporal information of the videos. Furthermore, the exclusive capsules are learned with dot product routing mechanism to limit the data distribution of closed set and open set and reduce the open set risk for OSAR. Extensive experimental results demonstrate that our proposed approach performs favorably compared with state-of-the-art methods on three standard datasets, which verifies its effectiveness and generalization ability.

Full Text