Abstract

Two-person interaction recognition has become an area of growing interest in human action recognition. The graph convolutional network (GCN) using human skeleton data has been shown to be highly effective for action recognition. Most GCN-based methods focus on recognizing an individual person’s actions on the basis of an intra-body graph. However, many of these methods do not represent the relation between two bodies, making it difficult to accurately recognize human interaction. In this work, we propose multi-stream adaptive GCN using inter- and intra-body graphs (MAGCN-IIG) as a new method of human interaction recognition. To ensure highly accurate human interaction recognition, our method cooperatively utilizes two types of graphs: an inter-body graph and an intra-body graph. The inter-body graph, which is newly introduced in this paper, connects the inter-body joints between two people as well as intra-body connections. The adaptive GCN using the inter-body graph captures the relation of joints between two people, even different types of joints located far away from each other. Further, by implementing a multi-stream architecture, our method simultaneously captures both inter-body and intra-body relations in each of two units that represent the position and motion of people. Experiments on interaction recognition using two large-scale human action datasets, NTU RGB+D and NTU RGB+D 120, showed that our method recognized human interactions more accurately than state-of-the-art methods.

Highlights

  • H UMAN action recognition has been widely applied to many tasks in video understanding such as video surveillance, manufacturing, and human-computer interaction [1]–[6]

  • RELATED WORKS we briefly review human interaction recognition in II-A and graph convolutional network (GCN) in II-B as works related to this article

  • PROPOSED METHOD we present the details of our proposed method, which we call multi-stream adaptive GCN using inter- and intra-body graphs (MAGCN-IIG)

Read more

Summary

Introduction

H UMAN action recognition has been widely applied to many tasks in video understanding such as video surveillance, manufacturing, and human-computer interaction [1]–[6]. Compared with approaches that use RGB image directly for action recognition, the skeleton-based approaches are robust against changes in brightness, appearance, and the interference of different background noises. A graph convolutional network (GCN) [22] was recently introduced to the field of action recognition in the form of a spatial-temporal GCN (ST-GCN) [23], which is more effective for action recognition compared to the RNN and CNN-based methods. The two-stream adaptive GCN (2s-AGCN) [24], which is an extended version of ST-GCN, has further improved the recognition accuracy. It utilizes an adaptive graph convolutional layer that makes it possible to optimize the topology of the graph by considering the relation between joints that are not directly connected. 2s-AGCN utilizes joint information (graph nodes) and bone information (graph edges) because the lengths and directions of bones are informative and discriminative

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call