Abstract

Feature representation is of vital importance for human action recognition. In recent few years, the application of deep learning in action recognition has become popular. However, for action recognition in videos, the advantage of single convolution feature over traditional methods is not so evident. In this paper, a novel feature representation that combines spatial and temporal feature with global motion information is proposed. Specifically, spatial and temporal feature from RGB images is extracted by convolutional neural network (CNN) and long short-term memory (LSTM) network. On the other hand, global motion information extracted from motion difference images using another separate CNN. Hereby, the motion difference images are binary video frames processed by exclusive or (XOR). Finally, support vector machine (SVM) is adopted as classifier. Experimental results on YouTube Action and UCF-50 show the superiority of the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.