Temporal Convolutional Neural Network for Gesture Recognition

Dongwei Lu,Chu Qiu,Yi Xiao

doi:10.1109/icis.2018.8466467

Abstract

Convolutional Neural Network (CNN) brings us advantages to extract image features. The Long Short Term Memory (LSTM) is a natural choice to construct time sequence models. We combine these two methods to generate an end-to-end model for gesture recognition. In this study, we propose a neural network structure, using the general CNN network to extract frame-level spatial features, and using LSTM to extract time sequence features. We name this network Temporal Convolution Neural Network (TCNN). Our experiments are performed with VIVA Gesture Dataset, and this dataset has 19 gestures, labeled by 8 people. Through 8-cross-fold validation, the network structure we propose has better performance than the state of art methods like 3DCNN. Meanwhile, we compare the results conducted with different general CNNs. That is, the network based on ReNet50 has the accuracy of 82.3% while light and shallow network MobileNet has the accuracy of 60%.

Full Text