Abstract

General-purpose automatic speech recognition (ASR) systems have improved in quality and are being used for pronunciation assessment. However, the assessment of isolated short utterances, such as words in minimal pairs for segmental approaches, remains an important challenge, even more so for non-native speakers. In this work, we compare the performance of our own tailored ASR system (kASR) with the one of Google ASR (gASR) for the assessment of Spanish minimal pair words produced by 33 native Japanese speakers in a computer-assisted pronunciation training (CAPT) scenario. Participants in a pre/post-test training experiment spanning four weeks were split into three groups: experimental, in-classroom, and placebo. The experimental group used the CAPT tool described in the paper, which we specially designed for autonomous pronunciation training. A statistically significant improvement for the experimental and in-classroom groups was revealed, and moderate correlation values between gASR and kASR results were obtained, in addition to strong correlations between the post-test scores of both ASR systems and the CAPT application scores found at the final stages of application use. These results suggest that both ASR alternatives are valid for assessing minimal pairs in CAPT tools, in the current configuration. Discussion on possible ways to improve our system and possibilities for future research are included.

Highlights

  • Licensee MDPI, Basel, Switzerland.Recent advances in automatic speech recognition (ASR) have made this technology a potential solution for transcribing audio input for computer-assisted pronunciation training (CAPT) tools [1,2]

  • We considered a comparison with other Spanish ASR systems [26] not to be informative or fair to carry out, since our kASR system is not a general-purpose one, but a tailored ASR with a closed set of words related to minimal pairs

  • Regarding our kASR system, we achieved values lower than 5.0% for native speech for the specific battery of minimal pairs introduced in Section 2 (e.g., All model: 0.0024%)

Read more

Summary

Introduction

Recent advances in automatic speech recognition (ASR) have made this technology a potential solution for transcribing audio input for computer-assisted pronunciation training (CAPT) tools [1,2]. ASR systems can help in the assessment and feedback of learner production, reducing human costs [4,5]. Most of the scarce empirical studies which include ASR technology in CAPT tools assess sentences in large portions of either reading or spontaneous speech [6,7], the assessment of words in isolation remains a substantial challenge [8,9]. General-purpose off-the-shelf ASR systems such as Google ASR

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.