Language fusion via adapters for low-resource speech recognition

Qing Hu,Yan Zhang,Xianlei Zhang,Zongyu Han,Xiuxia Liang

doi:10.1016/j.specom.2024.103037

Abstract

Data scarcity makes low-resource speech recognition systems suffer from severe overfitting. Although fine-tuning addresses this issue to some extent, it leads to parameter-inefficient training. In this paper, a novel language knowledge fusion method, named LanFusion, is proposed. It is built on the recent popular adapter-tuning technique, thus maintaining better parameter efficiency compared with conventional fine-tuning methods. LanFusion is a two-stage method. Specifically, multiple adapters are first trained on several source languages to extract language-specific and language-invariant knowledge. Then, the trained adapters are re-trained on the target low-resource language to fuse the learned knowledge. Compared with Vanilla-adapter, LanFusion obtains a relative average word error rate (WER) reduction of 9.8% and 8.6% on the Common Voice and FLEURS corpora, respectively. Extensive experiments demonstrate the proposed method is not only simple and effective but also parameter-efficient. Besides, using source languages that are geographically similar to the target language yields better results on both datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Language fusion via adapters for low-resource speech recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Feb 23, 2024
Citations: 1

Similar Papers

CAM: A cross-lingual adaptation framework for low-resource language speech recognition
Qing Hu ... Xilong Yu
Information Fusion | VOL. 111
Qing Hu, et. al.Qing Hu ... Xilong Yu
06 Jun 2024
Information Fusion | VOL. 111

Towards Fast and Accurate Streaming End-To-End ASR
Bo Li ... Tara N Sainath
-
Bo Li, et. al.Bo Li ... Tara N Sainath
01 May 2020
01 May 2020

Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech
Cong-Thanh Do
-
Cong-Thanh DoCong-Thanh Do
01 May 2019
01 May 2019

Adversarial Multilingual Training for Low-Resource Speech Recognition
Jiangyan Yi ... Ye Bai
-
Jiangyan Yi, et. al.Jiangyan Yi ... Ye Bai
01 Apr 2018
01 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Language fusion via adapters for low-resource speech recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication