Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Hao Zhou,Wengang Zhou,Yun Zhou,Houqiang Li

doi:10.1609/aaai.v34i07.7001

Abstract

Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and inter-cue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 114

Similar Papers

Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation
Hao Zhou ... Houqiang Li
IEEE Transactions on Multimedia | VOL. 24
Hao Zhou, et. al.Hao Zhou ... Houqiang Li
17 Feb 2021
IEEE Transactions on Multimedia | VOL. 24

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization
Runpeng Cui ... Changshui Zhang
-
Runpeng Cui, et. al.Runpeng Cui ... Changshui Zhang
01 Jul 2017
01 Jul 2017

Improving Continuous Sign Language Recognition with Consistency Constraints and Signer Removal
Ronglai Zuo ... Brian Mak
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 20
Ronglai Zuo, et. al.Ronglai Zuo ... Brian Mak
08 Mar 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 20

Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition
Shiqing Zhang ... Shiliang Zhang
-
Shiqing Zhang, et. al.Shiqing Zhang ... Shiliang Zhang
06 Jun 2016
06 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence