Deep neural network acoustic models for spoken assessment applications

Jian Cheng,Xin Chen,Angeliki Metallinou

doi:10.1016/j.specom.2015.07.006

Abstract

In this paper, we investigate the effectiveness of applying deep neural network hidden Markov models, or DNN-HMMs, for acoustic modeling in the context of educational applications. Specifically, we focus on spoken responses from non-native and child speech that tend to show great acoustic variability. We perform comprehensive experiments to compare the performance between traditional Gaussian mixture model (GMM)-HMMs and DNN-HMMs in three large language assessment datasets that contain various spoken tasks, classified broadly as constrained and open-ended tasks. Our experimental results suggest useful conclusions that can help guide the design of real-life educational applications. DNN-HMMs outperform conventional GMM-HMMs by a large margin for all spoken tasks commonly used in spoken assessment applications. In our experiments, DNN-HMMs trained using 25h of data can outperform GMM-HMMs trained with 6.7–9 times data. Specifically regarding overall performance, when all available training data were used (175, 227, 169h respectively), we achieved a relative word error rate decrease of 20.4% for adult English and 29.3% for child English, and a relative character error rate decrease of 14.3% for adult Chinese, when switching from GMMs to DNNs. In comparing between types of tasks, we notice that the more challenging open-ended tasks benefit significantly more than constrained item types by the use of DNN-HMMs. For open-ended tasks, having large amounts of training data is the key, as DNN-HMMs can take full advantage of the added training data and further push performance. In contrast, the performance of constrained spoken tasks saturates at around 25h of training data. At the same time, constrained spoken tasks require only a few hours of data (1 or 5h) to build well-performing acoustic models. This is an encouraging observation, that indicates the potential to build reliable spoken assessment applications based on constrained tasks, when few domain specific training data are available.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep neural network acoustic models for spoken assessment applications

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Jul 29, 2015
Citations: 34

Similar Papers

Deep neural network acoustic modeling for native and non-native Mandarin speech recognition
Xin Chen ... Jian Cheng
-
Xin Chen, et. al.Xin Chen ... Jian Cheng
01 Sep 2014
01 Sep 2014

Gaussian mixture models for adaptation of deep neural network acoustic models in automatic speech recognition systems
N.A Tomashenko ... Yu.N Matveev
Scientific and Technical Journal of Information Technologies, Mechanics and Optics | VOL. 106
N.A Tomashenko, et. al.N.A Tomashenko ... Yu.N Matveev
15 Nov 2016
Scientific and Technical Journal of Information Technologies, Mechanics and Optics | VOL. 106

Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.
Arun Narayanan ... Deliang Wang
IEEE/ACM transactions on audio, speech, and language processing | VOL. 23
Arun Narayanan, et. al.Arun Narayanan ... Deliang Wang
01 Jan 2014
IEEE/ACM transactions on audio, speech, and language processing | VOL. 23

Multi-task deep neural network acoustic models with model adaptation using discriminative speaker identity for whisper recognition
Jingjie Li ... Si Wei
-
Jingjie Li, et. al.Jingjie Li ... Si Wei
01 Apr 2015
01 Apr 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep neural network acoustic models for spoken assessment applications

Abstract

Talk to us

Similar Papers

More From: Speech Communication