Neural Sign Language Translation Based on Human Keypoint Estimation

Sang-Ki Ko,Choongsang Cho,Hyedong Jung,Chang Jo Kim

doi:10.3390/app9132683

Sang-Ki Ko, Choongsang Cho + Show 2 more

Open Access

https://doi.org/10.3390/app9132683

Copy DOI

Journal: Applied sciences	Publication Date: Jul 1, 2019
Citations: 133	License type: CC BY 4.0

Affiliation: Korea Electronics Technology Institute

Abstract

We propose a sign language translation system based on human keypoint estimation. It is well-known that many problems in the field of computer vision require a massive dataset to train deep neural network models. The situation is even worse when it comes to the sign language translation problem as it is far more difficult to collect high-quality training data. In this paper, we introduce the KETI (Korea Electronics Technology Institute) sign language dataset, which consists of 14,672 videos of high resolution and quality. Considering the fact that each country has a different and unique sign language, the KETI sign language dataset can be the starting point for further research on the Korean sign language translation. Using the KETI sign language dataset, we develop a neural network model for translating sign videos into natural language sentences by utilizing the human keypoints extracted from the face, hands, and body parts. The obtained human keypoint vector is normalized by the mean and standard deviation of the keypoints and used as input to our translation model based on the sequence-to-sequence architecture. As a result, we show that our approach is robust even when the size of the training data is not sufficient. Our translation model achieved 93.28% (55.28%, respectively) translation accuracy on the validation set (test set, respectively) for 105 sentences that can be used in emergency situations. We compared several types of our neural sign translation models based on different attention mechanisms in terms of classical metrics for measuring the translation performance.

Highlights

The absence of the ability to hear sounds is a huge obstacle to smooth and natural communication for the hearing-impaired people in a predominantly hearing world
Using the KETI sign language dataset, we present our sign language translation model based on the well-known off-the-shelf human keypoint detector and the sequence-to-sequence translation model
We have introduced a new sign language dataset, which is manually annotated in Korean spoken language sentences and proposed a neural sign language translation model based on the sequence-to-sequence translation models

Summary

Introduction

The absence of the ability to hear sounds is a huge obstacle to smooth and natural communication for the hearing-impaired people in a predominantly hearing world. The hearing-impaired people necessarily need help from professional sign language interpreters to communicate with the hearing people even when they have to reveal their very private and sensitive information. The hearing-impaired people become isolated and withdrawn from society. This leads us to investigate the possibility of developing an artificial intelligence technology that understands and communicates with the hearing-impaired people. Sign language recognition or translation is a very challenging problem since the task involves an interpretation between visual and linguistic information. Interpreting the collection of the visual information as natural language sentences is one of the tough challenges to realize the sign language translation problem

Methods

Results

Conclusion