Abstract

Recently in the task of human action recognition, Deep Convolutional Neural Networks (ConvNets) based approaches have achieved good performance. A series of approaches employ a two-stream architecture which takes advantage of both appearance information and motion information. However there are some drawbacks of the two-stream architecture. First, it didn't fully utilize the temporal information of the video, which is very important to action recognition. Second, it lacks the maneuvers to deal with the speed variations of the action. To tackle the above obstacles, in this paper, we first propose a novel video dynamics mining strategy which takes advantage of the motion tracking in the video. Next we introduce a frame skip scheme to the ConvNets, it stacks different modalities of optical flow to build a novel motion representation. Experiments on action recognition datasets such as UCF101 and HMDB51 show that the proposed video dynamics mining together with the frame skip ConvNets can match or surpass some of the state-of-the-art alternatives in the task of action recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call