Abstract

The RGB-D-based human action recognition is gaining increasing attention because the different modalities can provide complementary information. However, the recognition performance is still not satisfactory due to the limited ability to learn spatial-temporal feature and insufficient inter-model interaction. In this paper, we propose a novel approach for RGB-D human action recognition by aggregating spatial-temporal information and implementing cross-modality interactive learning. Firstly, a spatial-temporal information aggregation module (STIAM) is proposed to utilizes sample convolutional neural networks (CNNs) to aggregate the spatial-temporal information in entire RGB-D sequence into lightweight representations efficiently. This allows the model to extract richer spatial-temporal features with limited extra memory and computational cost. Secondly, a cross-modality interactive module (CMIM) is proposed to fully fuse the multi-modal complementary information. Moreover, a multi-modal interactive network (MMINet) is constructed for RGB-D-based action recognition by embeding the above two modules into the two-stream CNNs. In order to verify the universality of our approach, two backbones are deployed in the two-stream architecture, successively. Ablation experiments demonstrate that the proposed STIAM can bring significant improvement in recognizing actions. CMIM can further play the advantages of complementary features of multiple modalities. Extensive experiments on NTU RGB+D 60, NTU RGB+D 120 and PKU-MMD datasets proved the effectiveness of the proposed approach. The proposed approach achieves an accuracy of 94.3% and 96.5% for cross-subject and cross-view on NTU RGB+D 60, 91.7% and 92.6% for cross-subject and cross-setup on NTU RGB+D 120, 93.6% and 94.2% for cross-subject and cross-view on PKU-MMD datasets, which are the state-of-the-art performance. Further analysis denotes that our approach has advantages in recognizing subtle actions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.