Abstract
Skeleton-based human action recognition has attracted much attention in the field of computer vision. Most of the previous studies are based on fixed skeleton graphs so that only the local physical dependencies among joints can be captured, resulting in the omission of implicit joint correlations. In addition, under different views, the content of the same action is very different. In some views, keypoints will be blocked, which will cause recognition errors. In this paper, an action recognition method based on distance vector and multihigh view adaptive network (DV-MHNet) is proposed to address this challenging task. Among the mentioned techniques, the multihigh (MH) view adaptive networks are constructed to automatically determine the best observation view at different heights, obtain complete keypoints information of the current frame image, and enhance the robustness and generalization of the model to recognize actions at different heights. Then, the distance vector (DV) mechanism is introduced on this basis to establish the relative distance and relative orientation between different keypoints in the same frame and the same keypoints in different frame to obtain the global potential relationship of each keypoint, and finally by constructing the spatial temporal graph convolutional network to take into account the information in space and time, the characteristics of the action are learned. This paper has done the ablation study with traditional spatial temporal graph convolutional networks and with or without multihigh view adaptive networks, which reasonably proves the effectiveness of the model. The model is evaluated on two widely used action recognition benchmarks (NTU-RGB + D and PKU-MMD). Our method achieves better performance on both datasets.
Highlights
Human action recognition is currently one of the most important tasks in computer vision, and it is widely used in human-computer interaction, video surveillance, video understanding [1], and virtual reality [2,3,4]
In order to solve the above problems, this paper proposes a human action recognition method based on distance vector and multihigh view adaptive networks
The above methods have significantly improved the performance of human action recognition, they ignore the potential relations between different joints in the same frame and the same joint in different frames, and they do not enhance the skeleton image from multiple views which result in the lower recognition accuracy
Summary
Human action recognition is currently one of the most important tasks in computer vision, and it is widely used in human-computer interaction, video surveillance, video understanding [1], and virtual reality [2,3,4]. Most of the previous methods encode the position of the joints in each frame of the video, convert them into feature vectors, and perform pattern learning [9,10,11,12]. These methods ignore the potential connections between joints and lose a lot of movement information. E network divides the joints into several parts to use different convolution kernels for convolution This method distinguishes the joints in different regions, it Computational Intelligence and Neuroscience does not subdivide the relationship between the current joint and every other joints which will cause wrong recognition of some special actions. Because of their different heights, the content presented by the same action is in variation, so training the actions data only at a single height is often not very robust
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.