Abstract

Due to overly relying on appearance information or adopting direct static feature fusion, most of the existing action recognition methods based on multi-modality have poor robustness and insufficient consideration of modality differences. To address these problems, we propose a two-stream adaptive weight integration network with a three-dimensional parallel attention module, PA-AWCNN. Firstly, a three-dimensional Parallel Attention (PA) module is proposed to effectively extract features of spatial, temporal and channel dimensions and reduce the cross-dimensional interference, to achieve better robustness. Secondly, a Common Feature-driven (CFD) feature integration module is proposed to dynamically integrate appearance and depth features with adaptive weights, utilizing modality differences to redeem the lack of each feature, thereby balance the influence of both. The proposed PA-AW CNN uses the representative integrated feature generated by attention enhancement and feature integration for action recognition; it can not only get higher recognition accuracy but also improve the performance of distinguishing similar actions. Experiments illustrate that the proposed method achieves com-parable performances to state-of-the-art methods and obtains the accuracy of 92.76% and 95.65% on NTU RGB+D Dataset and SBU Kinect Interaction Dataset, respectively. The code is publicly available at: https://github.com/Luu-Yao/PA-AWCNN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call