A 3DCNN-LSTM Multi-Class Temporal Segmentation for Hand Gesture Recognition

Letizia Gionfrida,Anil A Bharath,Wan M R Rusli,Angela E Kedgley

doi:10.3390/electronics11152427

Letizia Gionfrida, Anil A Bharath + Show 2 more

Open Access

https://doi.org/10.3390/electronics11152427

Copy DOI

Journal: Electronics	Publication Date: Aug 4, 2022
Citations: 4	License type: CC BY 4.0

Affiliation: Imperial College London, Harvard University

Abstract

This paper introduces a multi-class hand gesture recognition model developed to identify a set of hand gesture sequences from two-dimensional RGB video recordings, using both the appearance and spatiotemporal parameters of consecutive frames. The classifier utilizes a convolutional-based network combined with a long-short-term memory unit. To leverage the need for a large-scale dataset, the model deploys training on a public dataset, adopting a technique known as transfer learning to fine-tune the architecture on the hand gestures of relevance. Validation curves performed over a batch size of 64 indicate an accuracy of 93.95% (±0.37) with a mean Jaccard index of 0.812 (±0.105) for 22 participants. The fine-tuned architecture illustrates the possibility of refining a model with a small set of data (113,410 fully labelled image frames) to cover previously unknown hand gestures. The main contribution of this work includes a custom hand gesture recognition network driven by monocular RGB video sequences that outperform previous temporal segmentation models, embracing a small-sized architecture that facilitates wide adoption.

Full Text