This work has been carried out by the members of the Laboratory of Speech and Multimodal Interfaces of the St. Petersburg Federal Research Center of the Russian Academy of Sciences within an interdisciplinary research project aimed at the creation of an automatic Russian sign language translation system. The paper presents the design of a Russian sign language digital database for a specific subject area, namely, «The first visit to doctor».Our database is meant to be used first of all a dataset for training neural network-based systems of automatic translation from Russian sign language. But also, it can be of interest for linguistics of sign languages in general since the approach elucidates solution of a number of distorting phenomena typical for continuous sign language tracking, such as epenthesis, assimilation, reduction, and hold deletion.The principal difference between the presented video data and other datasets developed for similar purposes is the use of continuous sign utterances and elements of Russian sign language proper instead of the so-called “signed Russian” (that is, the visual form of the Russian spoken language), popular in deaf schooling.One of the most challenging problems, dealt with in the paper, is an automatic segmentation of sign speech into separate meaningful units with clearly defined boundaries. Its solution is hampered by signs’ deformations in continuous speech. In this paper, the segmentation is carried out within the framework of the movement-hold model. This approach allows extraction of their functional core as well as annotation of possible changes likely to appear in different signs’ segments. Accordingly, each utterance was subdivided into segments of motion and hold, and the resulting sign schemes were then entered into a separate directory containing information about the main parameters of each hold, including the possibilities of change due to immediate environment. The result of this work is a full decomposition of the signs, forming the database which can find its application in different statistical models of Russian Sign Language.Among other linguistic peculiarities of the database should be noted lexical variability of signs, mouthing and code switching between Russian sign language and “signed Russian”. Also, our signers learned Russian sign language in different regions of Russia, thus the collected data is potentially a source for research in dialectal variability of Russian sign language.
Read full abstract