Gesture Recognition using FastDTW and Deep Learning Methods in the MSRC-12 and the NTU RGB+D Databases

Júlia Schubert Peixoto,Daniel Fernando Tello Gamarra,Marco Antonio De Souza Leite Cuadros,Daniel Welfer,Anselmo Rafael Cukla

doi:10.1109/tla.2022.9878175

Abstract

This work explores the use of three deep learning methods for gesture recognition: Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) using Fast Dynamic Time Warping (FastDTW). The gestures were captured by Kinect sensors, two skeleton-based databases are used: Microsoft Research Cambridge-12 (MSRC-12) and NTU RGB+D. Also, the FastDTW technique was also employed to standardize the input size of the data. The MSRC-12 database achieved an accuracy rate of 82,36% in the test set with the CNN, the LSTM achieved an accuracy rate of 87,30% also in the test set, and in GRU the accuracy achieved in the test set was 89,34%. With the NTU RGB+D database, two evaluation methods were used: Cross-View and Cross-Subject. In the test set with Cross-View evaluation was obtained an accuracy rate of 63,53%, 55,14%, and 61,00%, with CNN, LSTM, and GRU respectively; and with the Cross-Subject evaluation method, it was achieved an accuracy rate of 66,19%, 64,43% and 60,17% in the test set on CNN, LSTM and GRU, respectively.

Full Text