Abstract

Human action recognition is an important branch of computer vision science. It is a challenging task based on skeletal data because joints have complex spatiotemporal information. In this work, we propose a method for action recognition, which consists of three parts: view-independent representation, combination with cumulative Euclidean distance, and combined model. First, the action sequence becomes view-independent representations independent of the view. Second, these representations are further combined with cumulative Euclidean distances, so the joints more closely associated with the action are emphasised. Then, a combined model is adopted to extract these representation features and classify actions. It consists of two parts, a regular three-layer BLSTM network, and a temporal attention module. Experimental results on two multi-view benchmark datasets, Northwestern-UCLA and NTU RGB + D, demonstrate the effectiveness of our complete method. Despite its simple architecture and the use of only one type of action feature, it can still significantly improve recognition performance and has strong robustness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.