Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences

Olivier Pietquin ,Matthieu Geist ,Senthilkumar Chandramohan

doi:10.5591/978-1-57735-516-8/ijcai11-314

Abstract

Designing dialog policies for voice-enabled interfaces is a tailoring job that is most often left to natural language processing experts. This job is generally redone for every new dialog task because cross-domain transfer is not possible. For this reason, machine learning methods for dialog policy optimization have been investigated during the last 15 years. Especially, reinforcement learning (RL) is now part of the state of the art in this domain. Standard RL methods require to test more or less random changes in the policy on users to assess them as improvements or degradations. This is called on policy learning. Nevertheless, it can result in system behaviors that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. In this contribution, a sample-efficient, online and off-policy reinforcement learning algorithm is proposed to learn an optimal policy from few hundreds of dialogues generated with a very simple handcrafted policy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Simultaneous feature selection and parameter optimization for training of dialog policy by reinforcement learning
Teruhisa Misu ... Hideki Kashioka
-
Teruhisa Misu, et. al.Teruhisa Misu ... Hideki Kashioka
01 Dec 2012
01 Dec 2012

An emotion-sensitive dialogue policy for task-oriented dialogue system
Hui Zhu ... Kai Xv
Scientific Reports | VOL. 14
Hui Zhu, et. al.Hui Zhu ... Kai Xv
26 Aug 2024
Scientific Reports | VOL. 14

Off-policy learning in large-scale POMDP-based dialogue systems
Lucie Daubigney ... Matthieu Geist
-
Lucie Daubigney, et. al.Lucie Daubigney ... Matthieu Geist
01 Mar 2012
01 Mar 2012

Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
Ryuichi Takanobu ... Hanlin Zhu
-
Ryuichi Takanobu, et. al.Ryuichi Takanobu ... Hanlin Zhu
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sample efficient on-line learning of optimal dialogue policies with kalman temporal differences

Abstract

Talk to us

Similar Papers