Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

Qingran Zhan,Jing Wang,Juan Zuluaga-Gomez,Haobo Cheng,Xiang Xie,Chenguang Hu

doi:10.3390/electronics10243172

Qingran Zhan, Jing Wang + Show 4 more

Open Access

https://doi.org/10.3390/electronics10243172

Copy DOI

Abstract

Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multi-stream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.

Highlights

The attribute-based features (AFs) detector is the key part of our framework
We compare the results using AFs and bottleneck features (BNF) on mha-mode, the results indicate that the AFs still outperform BNF
Different languages do not have the same phone set, still, they share the phonological knowledge at AFs level by using

Summary

Related Work

Phonological research demonstrated that each sound unit of a language can be split into smaller phonological units based on articulators used to produce the respective sound. (articulatory class) features, and further used these detectors to process an English utterance spoken by a non-native Mandarin speaker [13] In those experiments, both stops and nasals attributes were correctly detected, which can prove that the speech attribute can be used in cross-lingual speech recognition in English and Mandarin. There are few studies on multilingual speech recognition integrating AFs; Hari Krishna et al, trained a bank of AFs detectors using source language to predict the articulatory features for the target languages, which showed that the combination of AFs using AF-Tandem method performs better than the lattice-rescoring approach [14]. In [17], researchers proposed a multi-stream set up to combine the M-vector features (Sub-band Based Energy Modulation Features) and MFCCs, which improves the ASR performance. Considering the previous works, multi-stream is an effective way to boost ASR systems, especially in challenging tasks (i.e., noisy environment and low-resource ASR)

Phonological Attributes

Domain-Adversarial Modeling

Multi-Stream ASR Framework

Train and Test Data Sets

AFs Detectors

Comparison Approaches

Experiments Configuration

Effectiveness of the Cross-Lingual AFs on ASR

Performance on Extremely Low-Resource Training Data

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Dec 20, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features
Qingran Zhan ... Shixuan Du
-
Qingran Zhan, et. al.Qingran Zhan ... Shixuan Du
01 Nov 2019
01 Nov 2019

Detecting Offensive Language on Malay Social Media: A Zero-Shot, Cross-Language Transfer Approach Using Dual-Branch mBERT
Xingyi Guo ... Muhammad Zaiamri Zainal Abidin
Applied Sciences | VOL. 14
Xingyi Guo, et. al.Xingyi Guo ... Muhammad Zaiamri Zainal Abidin
02 Jul 2024
Applied Sciences | VOL. 14

Cross-Lingual Named Entity Recognition for Heterogenous Languages
Yingwen Fu ... Nankai Lin
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31
Yingwen Fu, et. al.Yingwen Fu ... Nankai Lin
01 Jan 2023
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs
Federico Cassano ... Arjun Guha
Proceedings of the ACM on Programming Languages | VOL. 8
Federico Cassano, et. al.Federico Cassano ... Arjun Guha
08 Oct 2024
Proceedings of the ACM on Programming Languages | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics