Abstract
We consider neural sign language translation: machine translation from signed to written languages using encoder–decoder neural networks. Translating sign language videos to written language text is especially complex because of the difference in modality between source and target language and, consequently, the required video processing. At the same time, sign languages are low-resource languages, their datasets dwarfed by those available for written languages. Recent advances in written language processing and success stories of transfer learning raise the question of how pretrained written language models can be leveraged to improve sign language translation. We apply the Frozen Pretrained Transformer (FPT) technique to initialize the encoder, decoder, or both, of a sign language translation model with parts of a pretrained written language model. We observe that the attention patterns transfer in zero-shot to the different modality and, in some experiments, we obtain higher scores (from 18.85 to 21.39 BLEU-4). Especially when gloss annotations are unavailable, FPTs can increase performance on unseen data. However, current models appear to be limited primarily by data quality and only then by data quantity, limiting potential gains with FPTs. Therefore, in further research, we will focus on improving the representations used as inputs to translation models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.