Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language

Kak Soky,Sheng Li,Tatsuya Kawahara,Chenhui Chu

doi:10.1109/icassp49357.2023.10095644

Abstract

We address the effective finetuning of a large-scale pretrained model for automatic speech recognition (ASR) of lowresource languages with only a one-hour matched dataset. The finetuning is composed of domain adaptation and language adaptation, and they are conducted by using heterogeneous datasets, which are matched with either domain or language. For effective adaptation, we incorporate auxiliary tasks of domain identification and language identification with multi-task learning. Moreover, the embedding result of the auxiliary tasks is fused to the encoder output of the pretrained model for ASR. Experimental evaluations on the Khmer ASR using the corpus of ECCC (the Extraordinary Chambers in the Courts of Cambodia) demonstrate that first conducting domain adaption and then language adaption is effective. In addition, multi-tasking with domain identification and fusing the domain ID embedding gives the best performance, which is a CER improvement of 6.47% absolute from the baseline finetuning method.

Full Text