Abstract

A computer assisted pronunciation teaching system (CAPT) is a fundamental component in a computer assisted language learning system (CALL). A speech recognition based CAPT system often requires a large amount of speech data to train the incorrect phone models in its speech recognizer. But collecting incorrectly pronounced speech data is a labor intensive and costly work. This paper reports an effort on training the incorrect phone models by making use of synthesized speech data. A special formant speech synthesizer is designed to filter the correctly pronounced phones into incorrect phones by modifying the formant frequencies. In a Chinese Putonghua CALL system for native Cantonese speakers to learn Mandarin, a small experimental CAPT system is built with a synthetic speech data trained recognizer. Evaluation shows that a CAPT system using synthesized data can perform as good as or even better than that using real data provided that the size of the synthetic data are large enough.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.