Abstract

Computer-Assisted Language Learning (CALL) applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29-26% to 10-8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR) yield an equal error rate (EER) of 10.3%, which is significantly better than the EER for the other measures in isolation.

Highlights

  • The increasing demand for innovative applications that support language learning has led to a growing interest in Computer-Assisted Language Learning (CALL) systems that make use of automatic speech recognition (ASR) technology

  • We evaluated the speech decoding setups using the utterance error rate (UER), which is the percentage of utterances where the 1-Best decoding result deviates from the transcription

  • The utterance error rate (UER) of our speech decoder on the set ofdecoding results where the correct tran­ scription was present in the language model (LM) was 10.0%

Read more

Summary

Introduction

The increasing demand for innovative applications that support language learning has led to a growing interest in Computer-Assisted Language Learning (CALL) systems that make use of ASR technology. More problematic deviations may arise when the difficulty in perceiving and realizing phonetic features of the target language that are not distinctive in the mother tongue leads non-native speakers to blur the EURASIP Journal on Audio, Speech, and Music Processing distinction between phonemes in the target language, producing one phoneme instead of two distinct ones This is the case with many non-native speakers of English, for instance, Germans [6], who have difficulty in realizing the distinction between the English phonemes /ae/ and /e/ and often produce /e/ when /ae/ should be used, or Japanese speakers of English who have difficulty in distinguishing /l/ and /r/ [7] and may end up producing sounds that are neither an English /l/ nor an English /r/. This can happen when speech sounds are inappropriately deleted or inserted, which is another common phenomenon in non-native speech [8]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.