Word separation in continuous sign language using isolated signs and post-processing

Razieh Rastgoo,Kourosh Kiani,Sergio Escalera

doi:10.1016/j.eswa.2024.123695

Abstract

Continuous Sign Language Recognition (CSLR) is a long challenging task in Computer Vision due to the difficulties in detecting the explicit boundaries between the words in a sign sentence. To deal with this challenge, we propose a two-stage model. In the first stage, the predictor model, which includes a combination of CNN, SVD, and LSTM, is trained with the isolated signs. In the second stage, we apply a post-processing algorithm to the Softmax outputs obtained from the first part of the model in order to separate the isolated signs in the continuous signs. While the proposed model is trained on the isolated sign classes with similar frame numbers, it is evaluated on the continuous sign videos with a different frame length per each isolated sign class. Due to the lack of a large dataset, including both the sign sequences and the corresponding isolated signs, two public datasets in Isolated Sign Language Recognition (ISLR), RKS-PERSIANSIGN and ASLLVD, are used for evaluation. Results of the continuous sign videos confirm the efficiency of the proposed model to deal with isolated sign boundaries detection. The intuition behind the proposed post-processing methodology is to improve the recognition accuracy by removing the untrained and repetitive signs using the sliding window approach during the inference phase. This marks the first instance of such a mechanism within this domain. So, we present our methodology as a baseline for research community to enrich the methodology as well as evaluating on the other real data.

Full Text