Abstract

Low-resource automatic speech recognition is a chal- lenging task. To solve this issue, multilingual meta-learning learns a better model initialization from many source language tasks, allowing for rapid adaption to the target language. However, due to the lack of limitations on multilingual pre-training, the shared semantic space of different languages is difficult to learn. In this work, we propose an adversarial meta-learning training approach to solve this problem. By using the adversarial auxiliary aim of language identification in the meta-learning algorithm, it will guide the model encoder to generate language-independent embedding features, which can improve model generalization. And we use Wasserstein distance and temporal normalization to optimize our adversarial training, making the training more stable and easier. The approach is evaluated on the IARPA BABEL. The results reveal that our approach only requires half as many meta learning training epochs to attain comparable multilingual pre-training performance. It also outperforms the meta learning in all target languages fine-tuning and achieves comparable performance in small data scales. Specially, it can reduce CER from 71% to 62% with fine-tuning 25% of Vietnamese data. Finally, we show why our approach is superior than others by using t-SNE.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call