Abstract

Nowadays, action recognition is widely applied in many fields. However, action is hard to define by single modality information. The difference between image recognition and action recognition is that action recognition needs more modality information to depict one action, such as the appearance, the motion and the dynamic information. Due to the state of action evolves with the change of time, motion information must be considered when representing an action. Most of current methods define an action by spatial information and motion information. There are two key elements of current action recognition methods: spatial information achieved by sampling sparsely on video frames’ sequence and the motion content mostly represented by the optical flow which is calculated on consecutive video frames. However, the relevance between them in current methods is weak. Therefore, to strengthen the associativity, this paper presents a new architecture consisted of three streams to obtain multi-modality information. The advantages of our network are: (a) We propose a new sampling approach to sample evenly on the video sequence for acquiring the appearance information; (b) We utilize ResNet101 for gaining high-level and distinguished features; (c) We advance a three-stream architecture to capture temporal, spatial and dynamic information. Experimental results on UCF101 dataset illustrate that our method outperforms other previous methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.