Abstract

Human action recognition is an emerging goal of computer vision with several applications such as video surveillance and human-computer interaction. Despite many attempts to develop deep architectures to learn the spatio-temporal features of video, hand-crafted optical flow is still an important part of the recognition process. To engage the motion features deeply inside the learning process, we propose a spatio-temporal video recognition network where a motion-aware long short-term memory module is introduced to estimate the motion flow along with extracting spatio-temporal features. A specific optical flow estimator is subsumed which is based on kernelized cross correlation. The proposed network can be used without any extra learning process and there is no need to pre-compute and store the optical flow. Extensive experiments on two action recognition benchmarks verify the effectiveness of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call