CAM: A cross-lingual adaptation framework for low-resource language speech recognition

Qing Hu,Yan Zhang,Xianlei Zhang,Zongyu Han,Xilong Yu

doi:10.1016/j.inffus.2024.102506

Abstract

In this paper, a novel cross-lingual adaptation framework called CAM is presented for low-resource language speech recognition (LLSR). It is based on the recent popular adapter method. CAM is achieved by adapting self-supervised speech models (SSMs) from source languages to target low-resource languages in a two-stage process. CAM fills two research gaps existing in current methods: (i) language similarity is not effectively considered; (ii) the performance-efficiency trade-off is not well balanced. Specifically, two key components, a similarity-aware fusion module (SAFM) and an adapter weight-sharing strategy (AWSS), are designed. A well-trained adapter is introduced by SAFM to compute precise language similarities via dot product. AWSS takes both performance and efficiency into consideration by sharing adapter weights. Experimental results on two corpora, FLEURS and Common Voice, demonstrate that CAM equipped with these two designs, denoted as performance-oriented CAM (P-CAM), obtains state-of-the-art (SOTA) performance with satisfactory efficiency compared to current leading methods. Besides, an efficiency-oriented CAM (E-CAM) by introducing a weight-space fusion module (WSFM) is presented. The core of WSFM is to average the weights of multiple adapters into a new adapter. Compared to full fine-tuning, only 5.0% of trainable parameters are required for E-CAM while a 2.3% relative average word error rate (WER) reduction is obtained. CAM brings a performance- or efficiency-oriented trade-off to meet the different needs of speech recognition systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CAM: A cross-lingual adaptation framework for low-resource language speech recognition

Abstract

Talk to us

Similar Papers

More From: Information Fusion

Lead the way for us

Similar Papers

Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech
Cong-Thanh Do
-
Cong-Thanh DoCong-Thanh Do
01 May 2019
01 May 2019

Language fusion via adapters for low-resource speech recognition
Qing Hu ... Xiuxia Liang
Speech Communication | VOL. 158
Qing Hu, et. al.Qing Hu ... Xiuxia Liang
23 Feb 2024
Speech Communication | VOL. 158

Towards Fast and Accurate Streaming End-To-End ASR
Bo Li ... Tara N Sainath
-
Bo Li, et. al.Bo Li ... Tara N Sainath
01 May 2020
01 May 2020

Improving RNN Transducer Modeling for End-to-End Speech Recognition
Jinyu Li ... Rui Zhao
-
Jinyu Li, et. al.Jinyu Li ... Rui Zhao
13 Oct 2019
13 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CAM: A cross-lingual adaptation framework for low-resource language speech recognition

Abstract

Talk to us

Similar Papers

More From: Information Fusion