Abstract

Dialog management plays an important role in the task-oriented dialog system. Most of the previous works divide dialog management into state tracker and action selector. The two parts are modeled separately and implemented in a pipelined way, which suffers from the problem of error accumulation, and the feedback signal from action selector cannot be propagated to state tracker and natural language understanding module. This paper proposes a word-based partially observable Markov decision processes' dialog management that integrates natural language understanding, state tracker, and action selector into an end-to-end architecture. Our proposed dialog management takes the words from user utterances as inputs and then produces optimal action as well as slot values of natural language understanding which are necessary for response generation. To this end, we propose a hybrid learning method, which integrates reinforcement learning and supervised learning, to optimize the action selector and slot filler jointly. In addition, we develop a high-return prioritized experience replay to speed up the convergence of the training process. The experimental results show that the proposed dialog management outperforms four strong baselines in a series of different dialog tasks. A human user's evaluation also shows the same results. The high-return prioritized experience replay accelerates the convergence effectively, especially in the scenario in which the proposed dialog management works on more complex tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call