Abstract

Video action recognition requires accurate analysis of motion information along with spatial information of an object. In other words, it is necessary to learn both temporal and spatial information. In many deep learning-based action recognition methods, temporal and spatial information are extracted by a multi-stream network, where the temporal stream network analyzes the motion information using mathematical operations. In this paper, we present an action recognition method using a multi-stream network with a deep learning-based temporal relation module, which extracts motion information for the entire video in the temporal network path. The proposed method significantly increases the accuracy of action recognition using attached modules in front of the 2D CNN and late fusion with another network path. Owing to the proposed temporal stream network without additional mathematical operations, we could greatly reduces the amount of computation. As a result, the proposed method is suitable for a wide range of real-time visual action recognition tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call