Abstract

Scene knowledge plays an important role in visual analysis. For the task of action recognition, human activities often occur in specific scenes. However, it should be emphasised that the association between actions and scenes is very complex. Simplistic attempts to improve the effectiveness of action recognition by intensifying or suppressing the scene knowledge are unwise. In this article, we tackle this problem by proposing a new action recognition framework based on the Scene Adaptive Mechanism. Specifically, with the Scene Knowledge Modulation module, we can control the feature extractors to either suppress or intensify scene knowledge. And then, through an Adaptive Fusion Layer, the role of scene information in different visual feature sequences can thus be dynamically regulated and fused. The resulting model is abbreviated as SAM-Net. Our method serves as a pluggable module, capable of integration into other backbones to further enhance their performance. We perform extensive experiments on three large datasets: Something-Something V1&V2 and Kinetics-400. The quantitative and qualitative experimental results demonstrate the effectiveness of SAM-Net, with a great improvement in performance compared to the baseline methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.