Abstract

Skeleton-based action sequences are widely used for human behaviour understanding due to their compact characteristics. Most existing work designed Graph Convolutional Networks and integrated multiple input channels rather than the original motion sequence to improve the final performance. However, few of them have been reported on the detailed effects of such multiple input channels. In contrast to them, we systemically study the impact of different input channels and construct a more efficient GCN framework. We have identified the complementary effect between the local frame channel and global sequence channel, which is essential to improve the action recognition accuracy. By coupling local frame and global sequence information with a classical spatial–temporal graph neural network, e.g. MS-G3D, it achieves competitive performance compared with SOTA methods on challenging benchmark datasets. Related code would be available at https://github.com/movearbitrarily/multi-stream.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call