Abstract

Continuous Chinese sign language recognition (CCSLR) methods have shown their strong ability to learn excellent model architectures from datasets. However, due to data insufficiency, it is difficult to complete the CCSLR task. In this work, we focus on a simple but important solution to alleviate data insufficiency: how to refine the model architecture of a CCSLR network to improve the robustness of feature processing by using some better-quality non-Chinese sign language datasets. To this end, a simple empirical study was first conducted to verify the feasibility of knowledge transfer in the CCSLR task. Surprisingly, just by pre-training of our recognition model on a foreign sign language dataset, we can refine the model architecture and improve its robustness significantly. To make it more practical, the key issue of how to fine-tune the existing feature processing models for effective guidance should be carefully investigated. Then, we propose a novel scheme for fine-tuning of pre-trained models named FTP, which updates the spatial feature extractor initialized by a pre-trained backbone and freezes the temporal feature extractor implemented by a better shareable transformer encoder. Compared with the baseline method, our FTP method can achieve significant performance improvement on the public dataset USTC-CCSL.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call