Abstract

Recent developments in action recognition have resulted in an increase in the number of action categories. To accommodate the increase, an action dataset requires a large number of expensive and laboriously annotated videos. Thus, zero-shot action recognition (ZSAR) has become increasingly important. At present, there are two main ZSAR methods: one uses the video RGB image data, and the other uses the skeleton data of the human body. Conventional approaches use only one of these types of data and ignore the other data, thereby reducing the model accuracy. In this paper, we propose a three-stream graph convolutional network that processes both types of data. We use a two-stream graph convolutional network for RGB data and a motion branch for skeleton data. Combining these two outputs with a weighted sum, our model predicts final results for ZSAR. With experiments on the dataset UCF101, we show that our model provides better accuracy than a baseline model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call