Abstract

With the improvements of computer performance, deep learning has gradually expanded from 2D image tasks to 3D video tasks. Human action recognition is a typical 3D video task, which can achieve category classification by capturing human action characteristics. However, most of the videos are processed by encoding and decoding technology at current, thus the motion details are blurry, which makes it difficult for human action recognition. To solve this problem, we utilize the attention mechanism to “ignore” the blurred feature caused by video coding and decoding technology. Therefore, we hope to embed the attention mechanism in 3D spatiotemporal CNN to overcome this problem. Compared with 3D CNN, the effectiveness of our method is verified on UCF101 and HMDB51 dataset. And our method also proves the convolutional kernel implies channel-wise dependence. Although the improvement of the proposed method is limited, we hope the channel attention mechanism can help people to study neural network.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.