Abstract

Skeleton-based human action recognition has emerged recently thanks to its compactness and robustness to appearance variations. Although impressive results have been obtained in recent years, the performance of skeleton-based action recognition methods has to be improved to be deployed in real-time applications. Recently, a lightweight network structure named Double-feature Double-motion Network (DD-Net) has been proposed for the skeleton-based human action recognition. With high speed, the DD-Net achieves state-of-the-art performance on hand and body actions. The DD-Net could not distinguish actions if they have a weak connection with the global trajectories. However, the DD-Net is suitable for human action recognition where actions strongly correlate to the global trajectories. In this paper, the authors propose TD-Net, an improved version of the DD-Net in which a new branch is added. The new branch takes the normalised coordinates of joints (NCJ) to enrich the spatial information. On five datasets for skeleton-based human activity recognition that are MSR-Action3D, CMDFall, JHMDB, FPHAB, and NTU RGB + D, the TD-Net consistently obtains superior performance compared with the baseline model DD-Net. The proposed method outperforms different state-of-the-art methods, including both hand-designed and deep learning-based methods on four datasets (MSR-Action3D, CMDFall, JHMDB, and FPHAB). Furthermore, the generalisation of the proposed method is confirmed through cross-dataset evaluation. To illustrate the potential use of the model for real-time human action recognition, the authors have deployed an application on an edge device. The experimental result shows that the application can process up to 40 fps for pose estimation using MediaPipe. It takes only 0.04 ms to recognise an action from skeleton sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call