Abstract

Effective spatial-temporal representation of motion information is crucial to human action classification. In spite of the attempt of most existing methods capturing spatial-temporal structure and learning motion representations with deep neural networks, such representations are failing to model action at their full temporal extent. To address this problem, this paper proposes a global motion representation by using sequential low-rank tensor decomposition. Specifically, we model an action sequence as a third-order tensor with spatiotemporal structure. Then, by using low-rank tensor decomposition, partial motion of objects in global context were preserved which will be feeding into deep architecture to automatically learning global-term motion features. To simultaneously exploit static spatial features, short-term motion and global-term motion in the video, we describe a multi-stream framework with recurrent convolutional architectures which is end-to-end trainable. Gated Recurrent Unit (GRU) is used as our recurrent unit which have fewer parameters than Long Short-Term Memory (LSTM). Extensive experiments were conducted on two challenging dataset: HMDB51 and UCF101. Experimental results show that our method achieves state-of-the-art performance on the HMDB51 dataset, and is comparable to the state-of-the-art methods on the UCF101 dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call