User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning

Stefan Ultes,Wolfgang Maier

doi:10.5210/dad.2021.203

Abstract

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work that is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward signal based on user satisfaction. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. We show in simulated experiments that a live user satisfaction estimation model may be applied resulting in higher estimated satisfaction whilst achieving similar success rates. Moreover, we show that a satisfaction estimation model trained on one domain may be applied in many other domains that cover a similar task. We verify our findings by employing the model to one of the domains for learning a policy from real users and compare its performance to policies using user satisfaction and task success acquired directly from the users as reward.

Highlights

Spoken dialogue systems (SDSs) enable voice interaction between technical systems and humans
We argue that training a system to maximise user satisfaction (US) is a good alternative to TS for the following reasons: 1. User satisfaction is favourable over task success as it represents more accurately the user’s view and whether the user is likely to use the system again in the future
This article has demonstrated that employing a user satisfaction reward estimator for learning dialogue policies without any knowledge about the domain can yield good performance in terms of both task success rate and user satisfaction

Summary

Introduction

Spoken dialogue systems (SDSs) enable voice interaction between technical systems and humans. They have been advanced into our everyday lives. One prominent way of modelling the decision-making component of a spoken dialogue system is to use (partially observable) Markov decision processes ((PO)MDPs) (Lemon and Pietquin, 2012; Young et al, 2013). Task-oriented dialogue systems model the reward r, which is used to guide the learning process, traditionally with task success as the principal reward component (Gasicand Young, 2014; Lemon and Pietquin, 2007; Daubigney et al, 2012; Levin and Pieraccini, 1997; Singh et al, 2002; Young et al, 2013; Su et al, 2015, 2016)

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Dialogue & Discourse	Publication Date: Sep 28, 2021
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Dialogue & Discourse

Lead the way for us

Similar Papers

Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning
Stefan Ultes ... Milica Gašić
-
Stefan Ultes, et. al.Stefan Ultes ... Milica Gašić
20 Aug 2017
20 Aug 2017

Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning
Stefan Ultes
-
Stefan UltesStefan Ultes
01 Jan 2019
01 Jan 2019

Natural Language Generation as Incremental Planning Under Uncertainty: Adaptive Information Presentation for Statistical Dialogue Systems
Verena Rieser ... Simon Keizer
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Verena Rieser, et. al.Verena Rieser ... Simon Keizer
01 May 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems
Pei-Hao Su ... Milica Gasic
-
Pei-Hao Su, et. al.Pei-Hao Su ... Milica Gasic
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Dialogue &amp; Discourse

More From: Dialogue & Discourse