Abstract

Many advancements in computer vision and machine learning have shown potential for significantly improving the lives of people with disabilities. In particular, recent research has demonstrated that deep neural network models could be used to bridge the gap between the deaf who use sign language and hearing people. The major impediment to advancing such models is the lack of high-quality and large-scale training data. Moreover, previously released sign language datasets include few or no interrogative sentences compared to declarative sentences. In this paper, we introduce a new publicly available large-scale Korean Sign Language (KSL) dataset-KSL-Guide-that includes both declarative sentences and comparable interrogative sentences, which are required for a model to achieve high performance in real-world interactive tasks deployed on service applications. Our dataset contains a total of 121K sign language video samples featuring sentences and words spoken by native KSL speakers with extensive annotations (e.g., gloss, translation, keypoints, and timestamps). We exploit a multi-camera system to produce 3D human pose keypoints as well as 2D keypoints from multi-view RGB. Our experiments quantitatively demonstrate that the inclusion of interrogative sentences in training for sign language recognition and translation tasks greatly improves their performance. Furthermore, we empirically show the qualitative results by developing a prototype application using our dataset, providing an interactive guide service that helps to lower the communication barrier between sign language speakers and hearing people.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call