Abstract

Recent research in the area of action recognition has focused on coarse-grained action recognition, and there have been few studies on fine-grained action recognition. In response to this phenomenon, we propose a method for fine-grained action recognition using a deep convolutional network. This method uses the I3D network, which has achieved great success in the area of coarse-grained action recognition, as the basic network architecture. At the same time, the human pose and hand are extracted for obtaining local features of the fine-grained action. The I3D network is then used to extract RGB video frames, optical flow, human pose, and hands features, respectively. Finally, these features are combined. Since there are multiple different input streams input to the I3D network, our method is called a Multi-stream I3D Network. We validated this method on the MPII Cooking 2 dataset and reported the results in detail.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call