Decrease In Word Error Rate Research Articles

In this paper, we investigate the effectiveness of applying deep neural network hidden Markov models, or DNN-HMMs, for acoustic modeling in the context of educational applications. Specifically, we focus on spoken responses from non-native and child speech that tend to show great acoustic variability. We perform comprehensive experiments to compare the performance between traditional Gaussian mixture model (GMM)-HMMs and DNN-HMMs in three large language assessment datasets that contain various spoken tasks, classified broadly as constrained and open-ended tasks. Our experimental results suggest useful conclusions that can help guide the design of real-life educational applications. DNN-HMMs outperform conventional GMM-HMMs by a large margin for all spoken tasks commonly used in spoken assessment applications. In our experiments, DNN-HMMs trained using 25h of data can outperform GMM-HMMs trained with 6.7–9 times data. Specifically regarding overall performance, when all available training data were used (175, 227, 169h respectively), we achieved a relative word error rate decrease of 20.4% for adult English and 29.3% for child English, and a relative character error rate decrease of 14.3% for adult Chinese, when switching from GMMs to DNNs. In comparing between types of tasks, we notice that the more challenging open-ended tasks benefit significantly more than constrained item types by the use of DNN-HMMs. For open-ended tasks, having large amounts of training data is the key, as DNN-HMMs can take full advantage of the added training data and further push performance. In contrast, the performance of constrained spoken tasks saturates at around 25h of training data. At the same time, constrained spoken tasks require only a few hours of data (1 or 5h) to build well-performing acoustic models. This is an encouraging observation, that indicates the potential to build reliable spoken assessment applications based on constrained tasks, when few domain specific training data are available.

Read full abstract

Abstract In this research, we propose novel techniques to improve automatic speech recognition (ASR) and statistical machine translation (SMT) for dialectal Arabic. Since dialectal Arabic speech resources are very sparse, we describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic acoustic modeling. Our assumption is that MSA is always a second language for all Arabic speakers, and in most cases we can identify the original dialect of a speaker even though he is speaking MSA. Hence, an acoustic model trained with sufficient number of MSA speakers will implicitly model the acoustic features for the different Arabic dialects. Since, MSA and dialectal Arabic do not share the same phoneme set, we propose phoneme sets normalization in order to crosslingually use MSA in dialectal Arabic ASR. After normalization, we applied state-of-the-art acoustic model adaptation techniques to adapt MSA acoustic models with little amount of dialectal speech. Results indicate significant decrease in word error rate (WER). Since it is hard to phonetically transcribe large amounts of dialectal Arabic speech, we studied the use of graphemic acoustic models where phonetic transcription is approximated to be word letters instead of phonemes. A large number of Gaussians in the Gaussian mixture model is used to model missing vowels. In the case of graphemic adaptation, significant decrease in WER was also observed. The approaches were applied with Egyptian Arabic and Levantine Arabic. The reported experimental work was performed while the first author was at the German University in Cairo in collaboration with Ulm University. This work will be extended at Qatar University in collaboration with the University of Illinois to cover ASR and SMT for Qatari broadcast TV. We propose novel algorithms for learning the similarities and differences between Qatari Arabic (QA) and MSA, for purposes of automatic speech translation and speech-to-text machine translation, building on our own definitive research in the relative phonological, morphological, and syntactic systems of QA and MSA, and in the application of translation to interlingual semantic parse. Furthermore, we propose a novel efficient and accurate speech-to-text translation system, building on our research in landmark-based and segment-based ASR.

Read full abstract

Decrease In Word Error Rate Research Articles

Articles published on Decrease In Word Error Rate

Improving Deep Learning based Automatic Speech Recognition for Gujarati

Building a Speech and Text Corpus of Turkish: Large Corpus Collection with Initial Speech Recognition Results

Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition

Deep neural network acoustic models for spoken assessment applications

Adaptation to non-native speech using evolutionary-based discriminative linear transforms

An Experimental Study on Dynamic Features of Speech Structure

Challenges and Techniques for Dialectal Arabic Speech Recognition and Machine Translation

Integration of Statistical Models for Dictation of Document Translations in a Machine-Aided Human Translation Task

Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition

WORD EXTRACTION ASSOCIATED WITH A CONFIDENCE INDEX FOR ONLINE HANDWRITTEN SENTENCE RECOGNITION

Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition

Template-Based Continuous Speech Recognition

Overall risk criterion estimation of hidden Markov model parameters

Performance of HMM-based speech recognizers with discriminative state-weights

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Decrease In Word Error Rate Research Articles

Articles published on Decrease In Word Error Rate

Improving Deep Learning based Automatic Speech Recognition for Gujarati

Building a Speech and Text Corpus of Turkish: Large Corpus Collection with Initial Speech Recognition Results

Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition

Deep neural network acoustic models for spoken assessment applications

Adaptation to non-native speech using evolutionary-based discriminative linear transforms

An Experimental Study on Dynamic Features of Speech Structure

Challenges and Techniques for Dialectal Arabic Speech Recognition and Machine Translation

Integration of Statistical Models for Dictation of Document Translations in a Machine-Aided Human Translation Task

Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition

WORD EXTRACTION ASSOCIATED WITH A CONFIDENCE INDEX FOR ONLINE HANDWRITTEN SENTENCE RECOGNITION

Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition

Template-Based Continuous Speech Recognition

Overall risk criterion estimation of hidden Markov model parameters

Performance of HMM-based speech recognizers with discriminative state-weights