Benefiting from the advanced human visual system, humans naturally classify activities and predict motions in a short time. However, most existing computer vision studies consider those two tasks separately, resulting in an insufficient understanding of human actions. Moreover, the effects of view variations remain challenging for most existing skeleton-based methods, and the existing graph operators cannot fully explore multiscale relationship. In this article, a versatile graph-based model (Vers-GNN) is proposed to deal with those two tasks simultaneously. First, a skeleton representation self-regulated scheme is proposed. It is among the first trials that successfully integrate the idea of view adaptation into a graph-based human activity analysis system. Next, several novel graph operators are proposed to model the positional relationships and learn the abstract dynamics between different human joints and parts. Finally, a practical multitask learning framework and a multiobjective self-supervised learning scheme are proposed to promote both the tasks. The comparative experimental results show that Vers-GNN outperforms the recent state-of-the-art methods for both the tasks, with the to date highest recognition accuracies on the datasets of NTU RGB + D (CV: 97.2%), UWA3D (88.7%), and CMU (1000 ms: 1.13).
Read full abstract