Optimizing spoken dialogue management with fitted value iteration

Senthilkumar Chandramohan,Matthieu Geist,Olivier Pietquin

doi:10.21437/interspeech.2010-40

Abstract

In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algorithms fail to handle it. In this paper we explore the possibility of using a generalization framework for dialogue management which is a particular fitted value iteration algorithm (namely fitted-Q iteration). We show that fitted-Q, when applied to continuous state space dialogue management problems, can generalize well and makes efficient use of samples to learn the approximate optimal stateaction value function. Our experimental results show that fittedQ performs significantly better than the hand-coded policy and relatively better than the policy learned using least-square policy iteration (LSPI), another generalization algorithm. Index Terms: Spoken Dialogue Systems, Dialogue Management, Reinforcement Learning

Full Text