Domain Generalization for Language-Independent Automatic Speech Recognition.

Heting Gao,Kaizhi Qian,Shiyu Chang,Mark Hasegawa-Johnson,Junrui Ni,Yang Zhang

doi:10.3389/frai.2022.806274

Abstract

A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained. Language-independent ASR is difficult to train, because different languages implement phones differently: even when phonemes in two different languages are written using the same symbols in the international phonetic alphabet, they are differentiated by different distributions of language-dependent redundant articulatory features. This article demonstrates that the goal of language-independence may be approximated in different ways, depending on the size of the training set, the presence vs. absence of familial relationships between the training and test languages, and the method used to implement phone recognition or classification. When the training set contains many languages, and when every language in the test set is related (shares the same language family with) a language in the training set, then language-independent ASR may be trained using an empirical risk minimization strategy (e.g., using connectionist temporal classification without extra regularizers). When the training set is limited to a small number of languages from one language family, however, and the test languages are not from the same language family, then the best performance is achieved by using domain-invariant representation learning strategies. Two different representation learning strategies are tested in this article: invariant risk minimization, and regret minimization. We find that invariant risk minimization is better at the task of phone token classification (given known segment boundary times), while regret minimization is better at the task of phone token recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Artificial Intelligence	Publication Date: May 12, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Domain Generalization for Language-Independent Automatic Speech Recognition.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Artificial Intelligence

Lead the way for us

Similar Papers

Lattice based transcription loss for end-to-end speech recognition
Jian Kang ... Wei-Qiang Zhang
-
Jian Kang, et. al.Jian Kang ... Wei-Qiang Zhang
01 Oct 2016
01 Oct 2016

Cross-lingual adaptation of a CTC-based multilingual acoustic model
Sibo Tong ... Hervé Bourlard
Speech Communication | VOL. 104
Sibo Tong, et. al.Sibo Tong ... Hervé Bourlard
04 Sep 2018
Speech Communication | VOL. 104

Improving CTC Using Stimulated Learning for Sequence Modeling
Jahn Heymann ... Khe Chai Sim
-
Jahn Heymann, et. al.Jahn Heymann ... Khe Chai Sim
01 May 2019
01 May 2019

A Primer on Machine Learning.
Audrene S Edwards ... Bruce Kaplan
Transplantation | VOL. 105
Audrene S Edwards, et. al.Audrene S Edwards ... Bruce Kaplan
18 Aug 2020
Transplantation | VOL. 105

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Domain Generalization for Language-Independent Automatic Speech Recognition.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Artificial Intelligence