Abstract
In this paper, a novel cross-lingual adaptation framework called CAM is presented for low-resource language speech recognition (LLSR). It is based on the recent popular adapter method. CAM is achieved by adapting self-supervised speech models (SSMs) from source languages to target low-resource languages in a two-stage process. CAM fills two research gaps existing in current methods: (i) language similarity is not effectively considered; (ii) the performance-efficiency trade-off is not well balanced. Specifically, two key components, a similarity-aware fusion module (SAFM) and an adapter weight-sharing strategy (AWSS), are designed. A well-trained adapter is introduced by SAFM to compute precise language similarities via dot product. AWSS takes both performance and efficiency into consideration by sharing adapter weights. Experimental results on two corpora, FLEURS and Common Voice, demonstrate that CAM equipped with these two designs, denoted as performance-oriented CAM (P-CAM), obtains state-of-the-art (SOTA) performance with satisfactory efficiency compared to current leading methods. Besides, an efficiency-oriented CAM (E-CAM) by introducing a weight-space fusion module (WSFM) is presented. The core of WSFM is to average the weights of multiple adapters into a new adapter. Compared to full fine-tuning, only 5.0% of trainable parameters are required for E-CAM while a 2.3% relative average word error rate (WER) reduction is obtained. CAM brings a performance- or efficiency-oriented trade-off to meet the different needs of speech recognition systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.