Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks

Kun Li,Helen Meng,Xiaojun Qian

doi:10.1109/taslp.2016.2621675

Abstract

This paper investigates the use of multidistribution deep neural networks DNNs for mispronunciation detection and diagnosis MDD, to circumvent the difficulties encountered in an existing approach based on extended recognition networks ERNs. The ERNs leverage existing automatic speech recognition technology by constraining the search space via including the likely phonetic error patterns of the target words in addition to the canonical transcriptions. MDDs are achieved by comparing the recognized transcriptions with the canonical ones. Although this approach performs reasonably well, it has the following issues: 1 Learning the error patterns of the target words to generate the ERNs remains a challenging task. Phones or phone errors missing from the ERNs cannot be recognized even if we have well-trained acoustic models; and 2 acoustic models and phonological rules are trained independently, and hence, contextual information is lost. To address these issues, we propose an acoustic-graphemic-phonemic model AGPM using a multidistribution DNN, whose input features include acoustic features, as well as corresponding graphemes and canonical transcriptions encoded as binary vectors. The AGPM can implicitly model both grapheme-to-likely-pronunciation and phoneme-to-likely-pronunciation conversions, which are integrated into acoustic modeling. With the AGPM, we develop a unified MDD framework, which works much like free-phone recognition. Experiments show that our method achieves a phone error rate PER of 11.1%. The false rejection rate FRR, false acceptance rate FAR, and diagnostic error rate DER for MDD are 4.6%, 30.5%, and 13.5%, respectively. It outperforms the ERN approach using DNNs as acoustic models, whose PER, FRR, FAR, and DER are 16.8%, 11.0%, 43.6%, and 32.3%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2017
Citations: 129

Similar Papers

Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks
Kun Li ... Helen Meng
-
Kun Li, et. al.Kun Li ... Helen Meng
01 Sep 2014
01 Sep 2014

Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech
Shaoguang Mao ... Zhiyong Wu
-
Shaoguang Mao, et. al.Shaoguang Mao ... Zhiyong Wu
01 Jul 2018
01 Jul 2018

Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech
Shaoguang Mao ... Zhiyong Wu
-
Shaoguang Mao, et. al.Shaoguang Mao ... Zhiyong Wu
01 Apr 2018
01 Apr 2018

Voice activity detection applied to hands-free spoken dialogue robot based on decoding using acoustic and language model
...
-
, et. al. ...
15 Oct 2007
15 Oct 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing