Abstract

We address the effective finetuning of a large-scale pretrained model for automatic speech recognition (ASR) of lowresource languages with only a one-hour matched dataset. The finetuning is composed of domain adaptation and language adaptation, and they are conducted by using heterogeneous datasets, which are matched with either domain or language. For effective adaptation, we incorporate auxiliary tasks of domain identification and language identification with multi-task learning. Moreover, the embedding result of the auxiliary tasks is fused to the encoder output of the pretrained model for ASR. Experimental evaluations on the Khmer ASR using the corpus of ECCC (the Extraordinary Chambers in the Courts of Cambodia) demonstrate that first conducting domain adaption and then language adaption is effective. In addition, multi-tasking with domain identification and fusing the domain ID embedding gives the best performance, which is a CER improvement of 6.47% absolute from the baseline finetuning method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call