Cross-lingual Speech Recognition Research Articles

Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multi-stream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.

Read full abstract

In this study we present approaches to multilingual speech recognition. We first define different approaches, namely portation, cross-lingual and simultaneous multilingual speech recognition. We will show some experiments performed in the fields of multilingual speech recognition. In recent years we have ported our recognizer to other languages than German (Italian, Slovak, Slovenian, Czech, English, Japanese). We found that some languages achieve a higher recognition performance with comparable tasks, and are thus easier for automatic speech recognition than others. Furthermore, we present experiments which show the performance of cross-lingual speech recognition of an untrained language with a recognizer trained with other languages. The substitution of phones is important for cross-lingual and simultaneous multilingual recognition. We compared results in cross-lingual recognition for different baseline systems and found that the number of shared acoustic units is very important for the performance. With simultaneous multilingual recognition, performance usually decreases compared to monolingual recognition. In few cases, like in the case of non-native speech, however, the recognition can be improved.

Read full abstract

Cross-lingual Speech Recognition Research Articles

Articles published on Cross-lingual Speech Recognition

Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

Multilingual speech recognition in seven languages

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cross-lingual Speech Recognition Research Articles

Articles published on Cross-lingual Speech Recognition

Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

Multilingual speech recognition in seven languages