Abstract

In recent years, task-oriented dialogue systems have received extensive attention from academia and industry. Training a dialogue agent through reinforcement learning is often costly because it requires many interactions with real users. Although the Deep Dyna-Q (DDQ) framework uses simulation experience to alleviate the cost of direct reinforcement learning, it still suffers from challenges such as delayed rewards and policy degradation. This paper proposes an Emotion-Sensitive Deep Dyna-Q (ES-DDQ) model which: (1) presents an emotional world model that considers emotion-related cues to improve the ability of the traditional DDQ framework to model and simulate users, and (2) designs two kinds of emotion-related immediate rewards to mitigate the delayed reward problem. Experimental results show that our proposed approach effectively simulates users’ behaviors and is superior to the state-of-the-art benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call