Abstract
Human Action recognition is a complex problem that attracts more and more researchers from the scientific community due to its applicability in domains such as security and behavior analysis. At its core, this problem entails classifying an action into a finite set of classes. Neural network based approaches, and especially convolutional neural networks, are a good starting point for solving the problem of human action recognition. Due to their nature, they can recognize spatio-temporal features very well, making them ideal for working with sequences of RGB images. In this paper are proposed three types of convolutional neural network architectures that contribute to solving the problem of Human Action Recognition. The first one is based on 2D kernels, the second one on 3D kernels, and the third one on TCN (Temporal Convolutional Network) units. Each one is presented with its structure, advantages and disadvantages, along with metrics that measure their performance. The one based on 2D convolutions is the fastest, but it also has the lowest performances. The second one is a good middle ground, useful in certain situations which require a fast classifier operating on different action classes. Finally, the one based on TCNs performs close to some of the best existent models. It represents a viable solution to the proposed problem. It can classify many actions, using only RGB images of fairly low resolution, in real time. The three models have been tested on the RGB part of the NTU RGB+D dataset. The 2D convolution-based model obtained an accuracy of 7.43% on the Cross-Subject split and 10.28% on the Cross-View split. The 3D convolution-based model obtained 58.77% on Cross Subject and 56.11% on Cross-View. Finally, the TCN-based model obtained an accuracy of 80.45% on Cross-Subject and an accuracy of 82.57% on Cross-View.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.