Abstract

Reinforcement learning (RL) is an effective method in training dialogue policies to steer the conversation towards successful task completion. However, most RL-based methods only rely on semantic inputs that lack empathy as they ignore the user emotional information. Moreover, these methods suffer from delayed rewards caused by the user simulator returning valuable results only at dialogue end. Recently, some methods have been proposed to learn the reward function together with user emotions, but they omit considering user emotion in each dialogue turn. In this paper, we proposed an emotion-sensitive dialogue policy model (ESDP), it incorporates user emotions information into dialogue policy and selects the optimal action by the combination of top-k actions with the user emotions. The user emotion information in each turn is used as an immediate reward for the current dialogue state to solve sparse rewards and the dependency on termination. Extensive experiments validate that our method outperforms the baseline approaches when combined with different Q-Learning algorithms, and also surpasses other popular existing dialog policies’ performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.