Learning languages in addition to the native language is very important for all people in the globalized world today, and computer-aided pronunciation training (CAPT) is attractive since the software can be used anywhere at any time, and repeated as many times as desired. In this paper, we introduce the immersive interaction scenario offered by spoken dialogues to CAPT by proposing a recursive dialogue game to make CAPT personalized. A number of tree-structured sub-dialogues are linked sequentially and recursively as the script for the game. The system policy at each dialogue turn is to select in real-time along the dialogue the best training sentence for each specific individual learner within the dialogue script, considering the learner's learning status and the future possible dialogue paths in the script, such that the learner can have the scores for all pronunciation units considered reaching a predefined standard in a minimum number of turns. The purpose here is that those pronunciation units poorly produced by the specific learner can be offered with more practice opportunities in the future sentences along the dialogue, which enables the learner to improve the pronunciation without having to repeat the same training sentences many times. This makes the learning process for each learner completely personalized. The dialogue policy is modeled by Markov decision process (MDP) with high-dimensional continuous state space, and trained with fitted value iteration using a huge number of simulated learners. These simulated leaners have the behavior similar to real learners, and were generated from a corpus of real learner data. The experiments demonstrated very promising results and a real cloud-based system is also successfully implemented.
Read full abstract