Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning

Stefan Ultes

doi:10.18653/v1/w19-5902

Abstract

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we use a reward based on user satisfaction estimation. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. Furthermore, we apply this novel user satisfaction estimation model live in simulated experiments where the satisfaction estimation model is trained on one domain and applied in many other domains which cover a similar task. We show that applying this model results in higher estimated satisfaction, similar task success rates and a higher robustness to noise.

Highlights

One prominent way of modelling the decisionmaking component of a spoken dialogue system (SDS) is to use Markov decision processes ((PO)MDPs) (Lemon and Pietquin, 2012; Young et al, 2013)
In this work we proposed a novel model for interaction quality estimation based on BiLSTMs with attention mechanism that clearly outperformed the baseline while learning all temporal dependencies implicitly
We analysed the impact of the performance increase on learned polices that use this interaction quality estimator as the principal reward component

Summary

Introduction

One prominent way of modelling the decisionmaking component of a spoken dialogue system (SDS) is to use (partially observable) Markov decision processes ((PO)MDPs) (Lemon and Pietquin, 2012; Young et al, 2013). There, reinforcement learning (RL) (Sutton and Barto, 1998) is applied to find the optimal system behaviour represented by the policy π. Task-oriented dialogue systems model the reward r, used to guide the learning process, traditionally with task success as the principal reward component (Gasicand Young, 2014; Lemon and Pietquin, 2007; Daubigney et al, 2012; Levin and Pieraccini, 1997; Young et al, 2013; Su et al, 2015, 2016). The applied statistical user satisfaction estimator heavily relies on handcrafted temporal features. The impact of the estimation performance on the resulting dialogue policy remains unclear

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 31	License type: cc-by

Similar Papers

User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning
Stefan Ultes ... Wolfgang Maier
Dialogue & Discourse | VOL. 12
Stefan Ultes, et. al.Stefan Ultes ... Wolfgang Maier
28 Sep 2021
Dialogue & Discourse | VOL. 12

Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning
Stefan Ultes ... Milica Gašić
-
Stefan Ultes, et. al.Stefan Ultes ... Milica Gašić
20 Aug 2017
20 Aug 2017

Natural Language Generation as Incremental Planning Under Uncertainty: Adaptive Information Presentation for Statistical Dialogue Systems
Verena Rieser ... Simon Keizer
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Verena Rieser, et. al.Verena Rieser ... Simon Keizer
01 May 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Statistical Spoken Dialogue Systems and the Challenges for Machine Learning
Steve Young
-
Steve YoungSteve Young
02 Feb 2017
02 Feb 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers