Abstract

Training a satisfactory dialogue policy via Reinforcement Learning (RL) requires significant interaction costs because of delayed and sparse rewards in task-oriented dialogue tasks. Researching how to conduct efficient dialogue policy learning in environments where rewards are delayed and sparse is essential. Existing approaches obtain more positive rewards by incorporating an RL-based teacher model to customize a curriculum that matches the ability of dialogue policies for curriculum learning. However, such a teacher model still retains the disadvantage of reinforcement learning has expensive training costs. Therefore, we develop a novel framework, cold-start curriculum learning (CCL), for task-oriented dialogue policy learning that does not require any training cost for curriculum schedule. Besides, it adaptively adjusts the difficulty of the user goals and selects the next goal based on the feedback of students. Experiments show that the CCL significantly improves the effectiveness of dialogue tasks without any training cost for curriculum schedule.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call