Leveraging spatio-temporal features using graph neural networks for human activity recognition

M.S Subodh Raj,Sudhish N George,Kiran Raja

doi:10.1016/j.patcog.2024.110301

Abstract

Unsupervised human activity recognition (HAR) algorithms working on motion capture (mocap) data often use spatial information and neglect the activity-specific information contained in the temporal sequences. In this work, we propose a new unsupervised algorithm for HAR from mocap data to leverage both spatial and temporal information embedded in activity sequences. For this, we employ a shallow graph neural network (GNN) comprising a graph convolutional network and a gated recurrent unit to aggregate the spatial and temporal features of the mocap sequences, respectively. Moreover, we encode the transformations of the human body through log-regularized kernel covariance descriptors linked to the trajectory movement maps of mocap frames. These descriptors are then fused with the GNN features for downstream activity recognition tasks. Finally, HAR is performed by a new unsupervised algorithm using a neighborhood Laplacian regularizer and a normalized dictionary learning approach. The generalizability of the proposed model is validated by training the GNN on a public dataset and testing on the other datasets. The performance of the proposed model is evaluated using six publicly available human mocap datasets. Compared to existing approaches, the proposed model improves activity recognition consistently by 12%–30% across different datasets.

Full Text