Abstract

AbstractIt is challenging for an intelligent system to recognize the actions recorded in an RGB video due to the large amount of information and wide variations in the RGB video. On the other side, skeleton data focuses on the region of human body but lacks the interaction information with the background, which is complementary to the RGB data. Recently, some works focus on combining the RGB and skeleton data together to boost the performance of action recognition. However, the semantic information between joints is missing in existing works, which is important for action recognition. In this paper, we propose a novel semantic-augmented local decision aggregation network for action recognition. Specifically, we regard the area of body joints as the attention region to extract a local spatio-temporal feature for each body joint. In order to take advantage of the semantic information between joints, we propose a semantic information module, which jointly encodes the spatial and temporal index of body joints to enhance the representation ability of the local features. For better learning ability, instead of aggregating the local features, we first make decisions based on each individual local feature and then aggregate the local decisions for final recognition, which reflects the idea of resemble learning. Extensive experiments demonstrate the effectiveness of our proposed module which improves the performance of action recognition on three commonly used datasets.KeywordsAction recognitionAttentionLocal featuresDecision aggregationSemantic information

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.