Abstract

Sign Language Recognition (SLR) has become an appealing topic in modern societies because such technology can ideally be used to bridge the gap between deaf and hearing people. Although important steps have been made towards the development of real-world SLR systems, signer-independent SLR is still one of the bottleneck problems of this research field. In this regard, we propose a deep neural network along with an adversarial training objective, specifically designed to address the signer-independent problem. Specifically, the proposed model consists of an encoder, mapping from input images to latent representations, and two classifiers operating on these underlying representations: (i) the sign-classifier, for predicting the class/sign labels, and (ii) the signer-classifier, for predicting their signer identities. During the learning stage, the encoder is simultaneously trained to help the sign-classifier as much as possible while trying to fool the signer-classifier. This adversarial training procedure allows learning signer-invariant latent representations that are in fact highly discriminative for sign recognition. Experimental results demonstrate the effectiveness of the proposed model and its capability of dealing with the large inter-signer variations.

Highlights

  • Sign languages are the naturally occurring linguistic systems that arise within a Deaf community and, currently, considered the standard education method of deaf people worldwide

  • This paper presents a novel adversarial training objective, based on representation learning and deep neural networks, designed to tackle the signer-independent Sign Language Recognition (SLR) problem

  • The underlying idea is to learn signer-invariant latent representations that preserve as much information as possible about the signs, while discarding the signer-specific traits that are irrelevant for sign recognition

Read more

Summary

INTRODUCTION

Sign languages are the naturally occurring linguistic systems that arise within a Deaf community and, currently, considered the standard education method of deaf people worldwide. The underlying idea is to preserve as much information as possible about the signs, while discarding the signer-specific information that is implicitly present in the manual signing process For this purpose, the proposed deep model is composed by an encoder network, which maps from the input images to latent representations, as well as two discriminative classifiers operating on top of these underlying representations, namely the sign-classifier network and the signer-classifier network. To further constrain the latent representations to be signer-invariant, we introduce an additional training objective that operates on the hidden representations of the encoder network in order to enforce the latent distributions of different signers to be as similar as possible.

RELATED WORK
PROPOSED METHOD
Architecture
Adversarial Training
Signer-Transfer Training Objective
EXPERIMENTAL EVALUATION
Implementation Details
Results and Discussion
Transfer Learning
Ablation Study
Latent Space Visualization
Cluster Analysis in the Latent Space
Training Behavior of the Proposed Model
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call