Abstract
Effective extraction of human body parts and operated objects participating in action is the key issue of fine-grained action recognition. However, most of the existing methods require intensive manual annotation to train the detectors of these interaction components. In this paper, we represent videos by mid-level patches to avoid the manual annotation, where each patch corresponds to an action-related interaction component. In order to capture mid-level patches more exactly and rapidly, candidate motion regions are extracted by motion saliency. Firstly, the motion regions containing interaction components are segmented by a threshold adaptively calculated according to the saliency histogram of the motion saliency map. Secondly, we introduce a mid-level patch mining algorithm for interaction component detection, with object proposal generation and mid-level patch detection. The object proposal generation algorithm is used to obtain multi-granularity object proposals inspired by the idea of the Huffman algorithm. Based on these object proposals, the mid-level patch detectors are trained by K-means clustering and SVM. Finally, we build a fine-grained action recognition model using a graph structure to describe relationships between the mid-level patches. To recognize actions, the proposed model calculates the appearance and motion features of mid-level patches and the binary motion cooperation relationships between adjacent patches in the graph. Extensive experiments on the MPII cooking database demonstrate that the proposed method gains better results on fine-grained action recognition.
Highlights
In recent years, fine-grained action recognition has attracted a substantial amount of research interest, as it plays an important role in human–computer interaction, smart homes, elderly/child care, medical surveillance, and robots [1,2,3,4,5,6,7,8]
This paper presents a fine-grained action recognition method based on motion saliency and mid-level patches
A fine-grained action recognition algorithm based on motion saliency and mid-level patches is proposed
Summary
Fine-grained action recognition has attracted a substantial amount of research interest, as it plays an important role in human–computer interaction, smart homes, elderly/child care, medical surveillance, and robots [1,2,3,4,5,6,7,8]. Most existing action recognition methods focus on coarse-grained actions, such as full-body activities in daily life, e.g., jumping and waving. Compared with coarse-grained action recognition, fine-grained action recognition is more challenging due to the complex human motions and interactions, small body movements, large intra-class variability, small inter-class variability, etc. Fine-grained action recognition is still in its infancy. In the motion process of fine-grained actions, human body parts always interact with operated objects that are almost small and have a large variety of categories [10,11].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.