Multi-Action Dialog Policy Learning from Logged User Feedback

Shuo Zhang,Jing Tao,Junlan Feng,Pinghui Wang,Zi Liang,Yi Huang,Junzhou Zhao,Tianxiang Wang

doi:10.1609/aaai.v37i11.26636

Abstract

Multi-action dialog policy (MADP), which generates multiple atomic dialog actions per turn, has been widely applied in task-oriented dialog systems to provide expressive and efficient system responses. Existing MADP models usually imitate action combinations from the labeled multi-action dialog samples. Due to data limitations, they generalize poorly toward unseen dialog flows. While reinforcement learning-based methods are proposed to incorporate the service ratings from real users and user simulators as external supervision signals, they suffer from sparse and less credible dialog-level rewards. To cope with this problem, we explore to improve MADPL with explicit and implicit turn-level user feedback received for historical predictions (i.e., logged user feedback) that are cost-efficient to collect and faithful to real-world scenarios. The task is challenging since the logged user feedback provides only partial label feedback limited to the particular historical dialog actions predicted by the agent. To fully exploit such feedback information, we propose BanditMatch, which addresses the task from a feedback-enhanced semi-supervised learning perspective with a hybrid learning objective of SSL and bandit learning. BanditMatch integrates pseudo-labeling methods to better explore the action space through constructing full label feedback. Extensive experiments show that our BanditMatch improves MADPL over the state-of-the-art methods by generating more concise and informative responses. The source code and the appendix of this paper can be obtained from https://github.com/ShuoZhangXJTU/BanditMatch.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Action Dialog Policy Learning from Logged User Feedback

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 1

Similar Papers

Unifying explicit and implicit feedback for collaborative filtering
Nathan N Liu ... Min Zhao
-
Nathan N Liu, et. al.Nathan N Liu ... Min Zhao
26 Oct 2010
26 Oct 2010

Personalizing Activity Selection in Assistive Social Robots from Explicit and Implicit User Feedback
Marcos Maroto-Gómez ... Miguel Ángel Salichs
International Journal of Social Robotics | VOL. -
Marcos Maroto-Gómez, et. al.Marcos Maroto-Gómez ... Miguel Ángel Salichs
09 Apr 2024
International Journal of Social Robotics | VOL. -

Explicit feedback meet with implicit feedback in GPMF: a generalized probabilistic matrix factorization model for recommendation
Supriyo Mandal ... Abyayananda Maiti
Applied Intelligence | VOL. 50
Supriyo Mandal, et. al.Supriyo Mandal ... Abyayananda Maiti
21 Feb 2020
Applied Intelligence | VOL. 50

A systematic mapping study on crowdsourced requirements engineering using user feedback
Chong Wang ... Peng Liang
Journal of Software: Evolution and Process | VOL. 31
Chong Wang, et. al.Chong Wang ... Peng Liang
15 Jul 2019
Journal of Software: Evolution and Process | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Action Dialog Policy Learning from Logged User Feedback

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence