Abstract

The presented paper describes a novel approach to the detection of pronunciation errors. It makes use of the modeling of well-pronounced and mispronounced phonemes by means of the Dynamic Time Warping (DTW) algorithm. Four approaches that make use of the DTW phoneme modeling were developed to detect pronunciation errors: Variations of the Word Structure (VoWS), Normalized Phoneme Distances Thresholding (NPDT), Furthest Segment Search (FSS) and Normalized Furthest Segment Search (NFSS). The performance evaluation of each module was carried out using a speech database of correctly and incorrectly pronounced words in the Polish language, with up to 10 patterns of every trained word from a set of 12 words having different phonetic structures. The performance of DTW modeling was compared to Hidden Markov Models (HMM) that were used for the same four approaches (VoWS, NPDT, FSS, NFSS). The average error rate (AER) was the lowest for DTW with NPDT (AER=0.287) and scored better than HMM with FSS (AER=0.473), which was the best result for HMM. The DTW modeling was faster than HMM for all four approaches. This technique can be used for computer-assisted pronunciation training systems that can work with a relatively small training speech corpus (less than 20 patterns per word) to support speech therapy at home.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call