Transfer Learning for End-to-End ASR to Deal with Low-Resource Problem in Persian Language

Maryam Asadolahzade Kermanshahi,Babak Nasersharif,Ahmad Akbari

doi:10.1109/csicc52343.2021.9420540

Maryam Asadolahzade Kermanshahi, Babak Nasersharif + Show 1 more

https://doi.org/10.1109/csicc52343.2021.9420540

Copy DOI

Abstract

End-to-end models are state of the art for Automatic Speech Recognition (ASR) systems. Despite all their advantages, they suffer a significant problem: huge amounts of training data are required to achieve excellent performance. This problem is a serious challenge for low-resource languages such as Persian. Therefore, we need some methods and techniques to overcome this issue. One simple, yet effective method towards addressing this issue is transfer learning. We aim to explore the effect of transfer learning on a speech recognition system for the Persian language. To this end, we first train the network on 960 hours of English LibriSpeech corpus. Then, we transfer the trained network and fine-tune it on only about 3.5 hours of training data from the Persian FarsDat corpus. Transfer learning exhibits better performance while needing shorter training time than the model trained from scratch. Experimental results on FarsDat corpus indicate that transfer learning with a few hours of Persian training data can achieve 31.48% relative Phoneme Error Rate (PER) reduction compared to the model trained from scratch.

Full Text