Abstract

In this paper, we propose an efficient and effective action recognition framework by combining multiple feature models from dynamic image, optical flow and raw frame, with 3D convolutional neural network (CNN). Dynamic image preserves the long-term temporal information, while optical flow captures short-term temporal information, and raw frame represents the appearance information. Experiments demonstrate that dynamic image provides complementary information to raw frame feature and optical flow feature. Furthermore, with the approximate rank pooling, the computation of dynamic images is about 360 times faster than optical flow, and the dynamic image requires far less memory than optical flow and raw frame.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call