Abstract

Continuous gesture recognition aims at recognizing the ongoing gestures from continuous gesture sequences and is more meaningful for the scenarios, where the start and end frames of each gesture instance are generally unknown in practical applications. This paper presents an effective deep architecture for continuous gesture recognition. First, continuous gesture sequences are segmented into isolated gesture instances using the proposed temporal dilated Res3D network. A balanced squared hinge loss function is proposed to deal with the imbalance between boundaries and nonboundaries. Temporal dilation can preserve the temporal information for the dense detection of the boundaries at fine granularity, and the large temporal receptive field makes the segmentation results more reasonable and effective. Then, the recognition network is constructed based on the 3-D convolutional neural network (3DCNN), the convolutional long-short-term-memory network (ConvLSTM), and the 2-D convolutional neural network (2DCNN) for isolated gesture recognition. The “3DCNN-ConvLSTM-2DCNN” architecture is more effective to learn long-term and deep spatiotemporal features. The proposed segmentation and recognition networks obtain the Jaccard index of 0.7163 on the Chalearn LAP ConGD dataset, which is 0.106 higher than the winner of 2017 ChaLearn LAP Large-Scale Continuous Gesture Recognition Challenge .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call