Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings

Kak Soky,Sheng Li,Chenhui Chu,Tatsuya Kawahara

doi:10.1142/s2717554523500248

Abstract

This study investigates the effective incorporation of meta-information such as domain and language in finetuning a pretrained model based on self-supervised learning (SSL) for automatic speech recognition (ASR) in very low-resource settings. SSL pretrained models have been shown to achieve comparable or even better performance to conventional end-to-end systems even when we finetune them with a small dataset. However, it still requires the specific target dataset with a considerable amount of labeled data, like 10 h, to achieve satisfactory performance. Thus, we propose to exploit heterogeneous datasets which are partially matched either in language or domain and apply multi-task learning (MTL) or adversarial learning (ADV) using the meta-information. The finetuning comprises (1) domain adaptation, which uses in-domain multi-lingual datasets, and (2) language adaptation, which uses datasets of the same language but different domains. The auxiliary task is domain identification for language adaptation and language identification for domain adaptation. We then embed the output of the auxiliary task into the encoder output of the ASR task. The target dataset is the Khmer corpus of the Extraordinary Chambers in the Courts of Cambodia (ECCC) in various sizes from one hour to 10 h. The experimental evaluations demonstrate that fusing the meta-information in MTL or ADV significantly improves ASR accuracy. Moreover, a two-step adaptation method which first conducts domain adaptation and then language adaptation is the most effective. We also show that the target labeled dataset of only 5 h gives an almost saturated performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings

Abstract

Talk to us

Similar Papers

More From: International Journal of Asian Language Processing

Lead the way for us

Similar Papers

Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language
Kak Soky ... Tatsuya Kawahara
-
Kak Soky, et. al.Kak Soky ... Tatsuya Kawahara
04 Jun 2023
04 Jun 2023

Is the Doctrine of Joint Criminal Enterprise a Legitimate Mode of Individual Criminal Liability? − A Study of the Khmer Rouge Trials
Kitti Jayangakula
SSRN Electronic Journal | VOL. -
Kitti JayangakulaKitti Jayangakula
11 Apr 2013
SSRN Electronic Journal | VOL. -

Internal rules of the extraordinary chambers in the courts of Cambodia (ECCC): Setting an example of the rule of law by breaking the law?

Journal of Labelled Compounds and Radiopharmaceuticals | VOL. 3

28 Feb 2011
Journal of Labelled Compounds and Radiopharmaceuticals | VOL. 3

Reparation Modalities at the Extraordinary Chambers in the Courts of Cambodia (ECCC)
Juan-Pablo Perez-Leon-Acevedo
The Law & Practice of International Courts and Tribunals | VOL. 19
Juan-Pablo Perez-Leon-AcevedoJuan-Pablo Perez-Leon-Acevedo
27 Nov 2020
The Law & Practice of International Courts and Tribunals | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings

Abstract

Talk to us

Similar Papers

More From: International Journal of Asian Language Processing