Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

Joost van Doremalen,Helmer Strik,Catia Cucchiarini

doi:10.1155/2010/973954

Joost van Doremalen, Helmer Strik + Show 1 more

Open Access

https://doi.org/10.1155/2010/973954

Copy DOI

Abstract

Computer-Assisted Language Learning (CALL) applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29-26% to 10-8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR) yield an equal error rate (EER) of 10.3%, which is significantly better than the EER for the other measures in isolation.

Highlights

The increasing demand for innovative applications that support language learning has led to a growing interest in Computer-Assisted Language Learning (CALL) systems that make use of automatic speech recognition (ASR) technology
We evaluated the speech decoding setups using the utterance error rate (UER), which is the percentage of utterances where the 1-Best decoding result deviates from the transcription
The utterance error rate (UER) of our speech decoder on the set ofdecoding results where the correct tran scription was present in the language model (LM) was 10.0%

Summary

Introduction

The increasing demand for innovative applications that support language learning has led to a growing interest in Computer-Assisted Language Learning (CALL) systems that make use of ASR technology. More problematic deviations may arise when the difficulty in perceiving and realizing phonetic features of the target language that are not distinctive in the mother tongue leads non-native speakers to blur the EURASIP Journal on Audio, Speech, and Music Processing distinction between phonemes in the target language, producing one phoneme instead of two distinct ones This is the case with many non-native speakers of English, for instance, Germans [6], who have difficulty in realizing the distinction between the English phonemes /ae/ and /e/ and often produce /e/ when /ae/ should be used, or Japanese speakers of English who have difficulty in distinguishing /l/ and /r/ [7] and may end up producing sounds that are neither an English /l/ nor an English /r/. This can happen when speech sounds are inappropriately deleted or inserted, which is another common phenomenon in non-native speech [8]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Nov 1, 2009
Citations: 82	License type: cc-by

R Discovery Prime

R Discovery Prime

Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Utterance verification in continuous speech recognition: decoding and training procedures
E Lleida ... R.C Rose
IEEE Transactions on Speech and Audio Processing | VOL. 8
E Lleida, et. al.E Lleida ... R.C Rose
01 Mar 2000
IEEE Transactions on Speech and Audio Processing | VOL. 8

A dynamic in-search data selection method with its applications to acoustic modeling and utterance verification
Hui Jiang ... Chin-Hui Lee
IEEE Transactions on Speech and Audio Processing | VOL. 13
Hui Jiang, et. al. Hui Jiang ... Chin-Hui Lee
01 Sep 2005
IEEE Transactions on Speech and Audio Processing | VOL. 13

A hybrid approach to adapting acoustic and pronunciation models for non-native speech recognition
Yoo Rhee Oh ... Hong Kook Kim
-
Yoo Rhee Oh, et. al.Yoo Rhee Oh ... Hong Kook Kim
01 Jan 2009
01 Jan 2009

A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition
Yoo Rhee Oh ... Hong Kook Kim
IEICE Transactions on Information and Systems | VOL. E93-D
Yoo Rhee Oh, et. al.Yoo Rhee Oh ... Hong Kook Kim
01 Jan 2009
IEICE Transactions on Information and Systems | VOL. E93-D

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing