Model discrepancy policy optimization for task-oriented dialogue

Zhenyou Zhou,Zhibin Liu,Zhaoan Dong,Yuhan Liu

doi:10.1016/j.csl.2024.101636

Abstract

Task-oriented dialogue systems use deep reinforcement learning (DRL) to learn policies, and agent interaction with user models can help the agent enhance its generalization capacity. But user models frequently lack the language complexity of human interlocutors and contain generative errors, and their design biases can impair the agent’s ability to function well in certain situations. In this paper, we incorporate an evaluator based on inverse reinforcement learning into the model to determine the quality of the dialogue of user models in order to recruit high-quality user models for training. We can successfully regulate the quality of training trajectories while maintaining their diversity by constructing a sampling environment distribution to pick high-quality user models to participate in policy learning. The evaluation on the Multiwoz dataset demonstrates that it is capable of successfully improving the dialogue agents’ performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Model discrepancy policy optimization for task-oriented dialogue

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Similar Papers

Iterative policy learning in end-to-end trainable task-oriented neural dialog models
Bing Liu ... Ian Lane
-
Bing Liu, et. al.Bing Liu ... Ian Lane
01 Dec 2017
01 Dec 2017

A Systematic Literature Review of Reinforcement Algorithms in Machine Learning
Gabriel Kabanda ... Hemachandran Kannan
-
Gabriel Kabanda, et. al.Gabriel Kabanda ... Hemachandran Kannan
07 Apr 2023
07 Apr 2023

Deep reinforcement learning and its applications in medical imaging and radiation therapy: a survey
Lanyu Xu ... Ning Wen
Physics in Medicine & Biology | VOL. 67
Lanyu Xu, et. al.Lanyu Xu ... Ning Wen
11 Nov 2022
Physics in Medicine & Biology | VOL. 67

Experience Replay-based Deep Reinforcement Learning for Dialogue Management Optimisation
Shrikant Malviya ... Uma Shanker Tiwary
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -
Shrikant Malviya, et. al.Shrikant Malviya ... Uma Shanker Tiwary
25 May 2022
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Model discrepancy policy optimization for task-oriented dialogue

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language