Towards better transition modeling in recurrent neural networks: The case of sign language tokenization

Pierre Poitier,Jérôme Fink,Benoît Frénay

doi:10.1016/j.neucom.2023.127018

Abstract

Recurrent neural networks (RNNs) are a popular family of models widely used when facing sequential data such as videos. However, RNNs make assumptions about state transitions that could be damageable. This paper presents two theoretical limitations of RNNs along with popular extensions proposed to mitigate those issues. The effectiveness of these methods is assessed in practice on the specific task of sign language (SL) video tokenization, as it remains challenging. Evaluated strategies enhance transition modeling with RNNs functioning as state machines. However, this performance gain diminishes in more complex architectures, indicating there is still room for improvement. Such improvement would help to build powerful SL tokenizers usable in future pipelines in natural language processing.

Full Text