Abstract

End-to-end models are state of the art for Automatic Speech Recognition (ASR) systems. Despite all their advantages, they suffer a significant problem: huge amounts of training data are required to achieve excellent performance. This problem is a serious challenge for low-resource languages such as Persian. Therefore, we need some methods and techniques to overcome this issue. One simple, yet effective method towards addressing this issue is transfer learning. We aim to explore the effect of transfer learning on a speech recognition system for the Persian language. To this end, we first train the network on 960 hours of English LibriSpeech corpus. Then, we transfer the trained network and fine-tune it on only about 3.5 hours of training data from the Persian FarsDat corpus. Transfer learning exhibits better performance while needing shorter training time than the model trained from scratch. Experimental results on FarsDat corpus indicate that transfer learning with a few hours of Persian training data can achieve 31.48% relative Phoneme Error Rate (PER) reduction compared to the model trained from scratch.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call