Abstract

Video action recognition requires accurate analysis of motion information along with spatial information of an object. In other words, it is necessary to learn both temporal and spatial information. In many deep learning-based action recognition methods, temporal and spatial information are extracted by a multi-stream network, where the temporal stream network analyzes the motion information using mathematical operations. In this paper, we present an action recognition method using a multi-stream network with a deep learning-based temporal relation module, which extracts motion information for the entire video in the temporal network path. The proposed method significantly increases the accuracy of action recognition using attached modules in front of the 2D CNN and late fusion with another network path. Owing to the proposed temporal stream network without additional mathematical operations, we could greatly reduces the amount of computation. As a result, the proposed method is suitable for a wide range of real-time visual action recognition tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.