Multilingual Deep Neural Network Research Articles

In the FAME! project, we aim to develop an automatic speech recognition (ASR) system for Frisian-Dutch code-switching (CS) speech extracted from the archives of a local broadcaster with the ultimate goal of building a spoken document retrieval system. Unlike Dutch, Frisian is a low-resourced language with a very limited amount of manually annotated speech data. In this paper, we describe several automatic annotation approaches to enable using of a large amount of raw bilingual broadcast data for acoustic model training in a semi-supervised setting. Previously, it has been shown that the best-performing ASR system is obtained by two-stage multilingual deep neural network (DNN) training using 11 hours of manually annotated CS speech (reference) data together with speech data from other high-resourced languages. We compare the quality of transcriptions provided by this bilingual ASR system with several other approaches that use a language recognition system for assigning language labels to raw speech segments at the front-end and using monolingual ASR resources for transcription. We further investigate automatic annotation of the speakers appearing in the raw broadcast data by first labeling with (pseudo) speaker tags using a speaker diarization system and then linking to the known speakers appearing in the reference data using a speaker recognition system. These speaker labels are essential for speaker-adaptive training in the proposed setting. We train acoustic models using the manually and automatically annotated data and run recognition experiments on the development and test data of the FAME! speech corpus to quantify the quality of the automatic annotations. The ASR and CS detection results demonstrate the potential of using automatic language and speaker tagging in semi-supervised bilingual acoustic model training.

Read full abstract

It is challenging to obtain large amounts of native (matched) labels for speech audio in underresourced languages. This challenge is often due to a lack of literate speakers of the language, or in extreme cases, a lack of universally acknowledged orthography as well. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the underresourced language of interest called the target language (in place of native speakers), to transcribe what they hear as nonsense speech in their own annotation language ( $\ne$ target language). Previous uses of mismatched transcription converted it to a probabilistic transcription (PT), but PT is limited by the errors of nonnative perception. This paper proposes, instead, a multitask learning framework in which one deep neural network (DNN) is trained to optimize two separate tasks: acoustic modeling of a small number of matched transcription with matched target-language graphemes; and acoustic modeling of a large number of mismatched transcription with mismatched annotation-language graphemes. We find that: first, the multitask learning framework gives significant improvement over monolingual, semisupervised learning, multilingual DNN training, and transfer learning baselines; second, a Gaussian Mixture Model-Hidden-Markov Model (GMM-HMM) model adapted using PT improves alignments, thereby improving training; and third, bottleneck features trained on the mismatched transcriptions lead to even better alignments, resulting in further performance gains of the multitask DNN. Our experiments are conducted on the IARPA Georgian and Vietnamese BABEL corpora as well as on our newly collected speech corpus of Singapore Hokkien, an underresourced language with no standard written form.

Read full abstract

Multilingual Deep Neural Network Research Articles

Related Topics

Articles published on Multilingual Deep Neural Network

Investigation of Automatic Speech Recognition Systems via the Multilingual Deep Neural Network Modeling Methods for a Very Low-Resource Language, Chaha

Towards end-to-end speech recognition with transfer learning

Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR

Semi-supervised acoustic model training for speech with code-switching

Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription

Using Weighted Model Averaging in Distributed Multilingual DNNs to Improve Low Resource ASR

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multilingual Deep Neural Network Research Articles

Related Topics

Articles published on Multilingual Deep Neural Network

Investigation of Automatic Speech Recognition Systems via the Multilingual Deep Neural Network Modeling Methods for a Very Low-Resource Language, Chaha

Towards end-to-end speech recognition with transfer learning

Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR

Semi-supervised acoustic model training for speech with code-switching

Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription

Using Weighted Model Averaging in Distributed Multilingual DNNs to Improve Low Resource ASR