Abstract

We apply Reinforcement Learning (RL) to the problem of incremental dialogue policy learning in the context of a fast-paced dialogue game. We compare the policy learned by RL with a high-performance baseline policy which has been shown to perform very efficiently (nearly as well as humans) in this dialogue game. The RL policy outperforms the baseline policy in offline simulations (based on real user data). We provide a detailed comparison of the RL policy and the baseline policy, including information about how much effort and time it took to develop each one of them. We also highlight the cases where the RL policy performs better, and show that understanding the RL policy can provide valuable insights which can inform the creation of an even better rule-based policy.

Highlights

  • Building incremental spoken dialogue systems (SDSs) has recently attracted much attention

  • Our contributions are as follows: We provide an Reinforcement Learning (RL) method for incremental dialogue processing based on simplistic features which performs better in offline simulations than the high performance carefully designed rule (CDR) baseline

  • The policy learned using RL (LSPI with radial basis value function (RBF) functions) performs significantly better (p

Read more

Summary

Introduction

Building incremental spoken dialogue systems (SDSs) has recently attracted much attention. Our contributions are as follows: We provide an RL method for incremental dialogue processing based on simplistic features which performs better in offline simulations (based on real user data) than the high performance CDR baseline Note that this is a very strong baseline which has been shown to perform very efficiently (nearly as well as humans) in this dialogue game (Paetzel et al, 2015). The rule-based baselines used for comparing the RL policies against are not as carefully engineered as they could be, i.e., they are not the result of iterative improvement and optimization using insights learned from data or user testing This is understandable since building a very strong baseline would be a big project by itself and would detract attention from the RL problem. We highlight the cases where the RL policy performs better, and show that understanding the RL policy can provide valuable insights which can inform the creation of an even better rule-based policy

RDG-Image Game
Human-Human Data
Improving NLU with Agent Conversation Data
Room for Improvement
Design of the RL Policy
Experimental Setup
Results
Discussion & Future
Contrasting Baseline and RL Policy Building Efforts
Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.