Discriminative Relational Representation Learning for RGB-D Action Recognition.

Yu Kong,Yun Fu

doi:10.1109/tip.2016.2556940

Abstract

This paper addresses the problem of recognizing human actions from RGB-D videos. A discriminative relational feature learning method is proposed for fusing heterogeneous RGB and depth modalities, and classifying the actions in RGB-D sequences. Our method factorizes the feature matrix of each modality, and enforces the same semantics for them in order to learn shared features from multimodal data. This allows us to capture the complex correlations between the two modalities. To improve the discriminative power of the relational features, we introduce a hinge loss to measure the classification accuracy when the features are employed for classification. This essentially performs supervised factorization, and learns discriminative features that are optimized for classification. We formulate the recognition task within a maximum margin framework, and solve the formulation using a coordinate descent algorithm. The proposed method is extensively evaluated on two public RGB-D action data sets. We demonstrate that the proposed method can learn extremely low-dimensional features with superior discriminative power, and outperforms the state-of-the-art methods. It also achieves high performance when one modality is missing in testing or training.

Full Text