Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription

Van Hai Do,Boon Pang Lim,Nancy F Chen,Mark A Hasegawa-Johnson

doi:10.1109/taslp.2017.2782360

Abstract

It is challenging to obtain large amounts of native (matched) labels for speech audio in underresourced languages. This challenge is often due to a lack of literate speakers of the language, or in extreme cases, a lack of universally acknowledged orthography as well. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the underresourced language of interest called the target language (in place of native speakers), to transcribe what they hear as nonsense speech in their own annotation language ( $\ne$ target language). Previous uses of mismatched transcription converted it to a probabilistic transcription (PT), but PT is limited by the errors of nonnative perception. This paper proposes, instead, a multitask learning framework in which one deep neural network (DNN) is trained to optimize two separate tasks: acoustic modeling of a small number of matched transcription with matched target-language graphemes; and acoustic modeling of a large number of mismatched transcription with mismatched annotation-language graphemes. We find that: first, the multitask learning framework gives significant improvement over monolingual, semisupervised learning, multilingual DNN training, and transfer learning baselines; second, a Gaussian Mixture Model-Hidden-Markov Model (GMM-HMM) model adapted using PT improves alignments, thereby improving training; and third, bottleneck features trained on the mismatched transcriptions lead to even better alignments, resulting in further performance gains of the multitask DNN. Our experiments are conducted on the IARPA Georgian and Vietnamese BABEL corpora as well as on our newly collected speech corpus of Singapore Hokkien, an underresourced language with no standard written form.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM transactions on audio, speech, and language processing	Publication Date: Mar 1, 2018
Citations: 53	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing

Lead the way for us

Similar Papers

A pseudo-task design in multi-task learning deep neural network for speaker recognition
Xugang Lu ... Hisashi Kawai
-
Xugang Lu, et. al.Xugang Lu ... Hisashi Kawai
01 Oct 2016
01 Oct 2016

Doing More With Less: A Multitask Deep Learning Approach in Plant Phenotyping.
Andrei Dobrescu ... Sotirios A. Tsaftaris
Frontiers in Plant Science | VOL. 11
Andrei Dobrescu, et. al.Andrei Dobrescu ... Sotirios A. Tsaftaris
28 Feb 2020
Frontiers in Plant Science | VOL. 11

Multi-lingual speech recognition with low-rank multi-task deep neural networks
Aanchan Mohan ... Richard Rose
-
Aanchan Mohan, et. al.Aanchan Mohan ... Richard Rose
01 Apr 2015
01 Apr 2015

Turkish Speech Recognition Based On Deep Neural Networks
Ussen Abre Kimanuka ... Osman Buyuk
Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi | VOL. 22
Ussen Abre Kimanuka, et. al.Ussen Abre Kimanuka ... Osman Buyuk
05 Sep 2018
Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing