Abstract

Skeleton-based action recognition methods have been widely developed in recent years. However, the occlusion problem is still a difficult problem at present. Existing skeleton action recognition methods are usually based on complete skeleton data, and their performance is greatly reduced in occluded skeleton action recognition tasks. In order to improve the recognition accuracy on occluded skeleton data, a multi-stream fusion graph convolutional network (MSFGCN) is proposed. The proposed multi-stream fusion network consists of multiple streams, and different streams can handle different occlusion cases. In addition, joint coordinates, relative coordinates, small-scale temporal differences and large-scale temporal differences are extracted simultaneously to construct more discriminative multimodal features. In particular, to the best of our knowledge, we are the first to propose the simultaneous extraction of temporal difference features at different scales, which can more effectively distinguish between actions with different motion amplitude. Experimental results show that the proposed MSFGCN obtains state-of-the-art performance on occluded skeleton datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call