Emotion-sensitive deep dyna-Q learning for task-completion dialogue policy learning

Rui Zhang,Zhenyu Wang,Mengdan Zheng,Yangyang Zhao,Zhenhua Huang

doi:10.1016/j.neucom.2021.06.075

Abstract

In recent years, task-oriented dialogue systems have received extensive attention from academia and industry. Training a dialogue agent through reinforcement learning is often costly because it requires many interactions with real users. Although the Deep Dyna-Q (DDQ) framework uses simulation experience to alleviate the cost of direct reinforcement learning, it still suffers from challenges such as delayed rewards and policy degradation. This paper proposes an Emotion-Sensitive Deep Dyna-Q (ES-DDQ) model which: (1) presents an emotional world model that considers emotion-related cues to improve the ability of the traditional DDQ framework to model and simulate users, and (2) designs two kinds of emotion-related immediate rewards to mitigate the delayed reward problem. Experimental results show that our proposed approach effectively simulates users’ behaviors and is superior to the state-of-the-art benchmarks.

Full Text