Abstract

A deep reinforcement learning algorithm has been widely used for learning dialog policy in a task-oriented dialog system. Dialog agent collects training data and improves its policy by interacting with users. Interact with real users is time-consuming and not realistic. Consequently, we usually build a user simulator instead of a real user. At the beginning of each dialogue, the simulator will sample a user goal extract from training data. Then the simulator will communicate with the dialog agent to accomplish this user goal. The existing user simulator usually samples this user goal randomly. This will cause the dialog agent to waste a lot of time to learn what it already learned. To solve this problem, we propose two user goal weighting methods which give relatively large weight to the user goal the current dialog agent can’t accomplish. This lets the dialog agent pay more attention to those user goals. Experiment result in a movie-ticket booking task shows that the proposed weighted user goal sampling method can effectively accelerate the policy learning progress compares to the random sampling method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call