Abstract

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we use a reward based on user satisfaction estimation. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. Furthermore, we apply this novel user satisfaction estimation model live in simulated experiments where the satisfaction estimation model is trained on one domain and applied in many other domains which cover a similar task. We show that applying this model results in higher estimated satisfaction, similar task success rates and a higher robustness to noise.

Highlights

  • One prominent way of modelling the decisionmaking component of a spoken dialogue system (SDS) is to use Markov decision processes ((PO)MDPs) (Lemon and Pietquin, 2012; Young et al, 2013)

  • In this work we proposed a novel model for interaction quality estimation based on BiLSTMs with attention mechanism that clearly outperformed the baseline while learning all temporal dependencies implicitly

  • We analysed the impact of the performance increase on learned polices that use this interaction quality estimator as the principal reward component

Read more

Summary

Introduction

One prominent way of modelling the decisionmaking component of a spoken dialogue system (SDS) is to use (partially observable) Markov decision processes ((PO)MDPs) (Lemon and Pietquin, 2012; Young et al, 2013). There, reinforcement learning (RL) (Sutton and Barto, 1998) is applied to find the optimal system behaviour represented by the policy π. Task-oriented dialogue systems model the reward r, used to guide the learning process, traditionally with task success as the principal reward component (Gasicand Young, 2014; Lemon and Pietquin, 2007; Daubigney et al, 2012; Levin and Pieraccini, 1997; Young et al, 2013; Su et al, 2015, 2016). The applied statistical user satisfaction estimator heavily relies on handcrafted temporal features. The impact of the estimation performance on the resulting dialogue policy remains unclear

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.