A Multimodal Pairwise Discrimination Network for Cross-Domain Action Recognition

Fuhua Shang,Feng Tian,Jun Wei Tao,Tao Tao Han,Zan Gao

doi:10.1109/access.2020.3014691

Abstract

In recent years, action recognition has become a hot research topic in the computer vision and machine learning domain. Despite many well-designed action recognition approaches have been proposed, we point out that some limitations still exist including the separated fusion of different Spatio-temporal features and the reconstruction classification model, and the requirement of similar environmental conditions when capturing the training and testing data. Thus, research interest has shifted from traditional action recognition towards cross-domain action recognition. To solve these limitations, in this work, we propose a novel multimodal pairwise discrimination network (short for MPD) for cross-domain action recognition that is an end-to-end network architecture. In MPD, it can jointly fuse different Spatio-temporal features from the video, learn domain invariant features for different action domains (source and target domains), and build the classification model. To characterize the shift between these domains, subnetwork parameters in corresponding layers of MPD are required to be relevant, but not identical. Besides, the domain invariant feature discrimination needs to be improved. Extensive experimental results on two different public benchmarks including indoor environment and outdoor environment demonstrate that our MPD solution can significantly outperform state-of-the-art methods with a 4% to 20% improvement in average accuracy.

Highlights

Since human action recognition has been widely applied in visual surveillance and some other domains [3], [5], [6], [13], [18], [25], [39], [40], [42], [46], [48], [56], it has become a hot research topic
Methods are effective in these controlled environments, the key bottleneck is that the cross-domain constraint is ignored, which will limit the application of action recognition
EXPERIMENTS AND DISCUSSION We evaluate our method in the context of domain adaptation for action recognition

Summary

INTRODUCTION

Since human action recognition has been widely applied in visual surveillance and some other domains [3], [5], [6], [13], [18], [25], [39], [40], [42], [46], [48], [56], it has become a hot research topic. A novel end-to-end multimodal pairwise discrimination network for cross-domain action recognition is proposed, which can jointly fuse different visual features and build the classification model. A novel multimodal pairwise discrimination end-to-end network for cross-domain action recognition is proposed, which can seamlessly fuse different feature representations and learn the classification model. It can lower the demand for a large number of labeled action samples for training deep learning models.

RELATED WORK

PAIRWISE DISCRIMINATION LOSS

EXPERIMENTS AND DISCUSSION

COMPETITORS Several popular methods are employed for comparison:

Findings

CONCLUSION