Abstract

Deep learning-based algorithms have been very successful in skeleton-based human activity recognition. Skeleton data contains 2-D or 3-D coordinates of human body joints. The main focus of most of the existing skeleton-based activity recognition methods is on designing new deep architectures to learn discriminative features, where all body joints are considered equally important in recognition. However, the importance of joints varies as an activity proceeds within a video and across different activities. In this work, we hypothesize that selecting relevant joints, prior to recognition, can enhance performance of the existing deep learning-based recognition models. We propose a spatial hard attention finding method that aims to remove the uninformative and/or misleading joints at each frame. We formulate the joint selection problem as a Markov decision process and employ deep reinforcement learning to train the proposed spatial-attention-aware agent. No extra labels are needed for the agent’s training. The agent takes a sequence of features extracted from skeleton video as input and outputs a sequence of probabilities for joints. The proposed method can be considered as a general framework that can be integrated with the existing skeleton-based activity recognition methods for performance improvement purposes. We obtain very competitive activity recognition results on three commonly used human activity recognition datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call