Abstract

Recurrent neural networks (RNNs) are a popular family of models widely used when facing sequential data such as videos. However, RNNs make assumptions about state transitions that could be damageable. This paper presents two theoretical limitations of RNNs along with popular extensions proposed to mitigate those issues. The effectiveness of these methods is assessed in practice on the specific task of sign language (SL) video tokenization, as it remains challenging. Evaluated strategies enhance transition modeling with RNNs functioning as state machines. However, this performance gain diminishes in more complex architectures, indicating there is still room for improvement. Such improvement would help to build powerful SL tokenizers usable in future pipelines in natural language processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call