Hierarchical Reinforcement Learning for Open-Domain Dialog

Abdelrhman Saleh,Asma Ghandeharioun,Rosalind Picard,Judy Shen,Natasha Jaques

doi:10.1609/aaai.v34i05.6400

Abdelrhman Saleh, Asma Ghandeharioun + Show 3 more

Open Access

https://doi.org/10.1609/aaai.v34i05.6400

Copy DOI

Abstract

Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning (HRL), VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements – in terms of both human evaluation and automatic metrics – over state-of-the-art dialog models, including Transformers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 15	License type: cc-by-nc-sa

R Discovery Prime

R Discovery Prime

Hierarchical Reinforcement Learning for Open-Domain Dialog

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B Sai ... Sreyas Mohan
-
Ananya B Sai, et. al.Ananya B Sai ... Sreyas Mohan
01 Jan 2020
01 Jan 2020

Perturbation CheckLists for Evaluating NLG Evaluation Metrics
...
-
, et. al. ...
15 Oct 2021
15 Oct 2021

Hierarchical multi-agent reinforcement learning
Mohammad Ghavamzadeh ... Rajbala Makar
Autonomous Agents and Multi-Agent Systems | VOL. 13
Mohammad Ghavamzadeh, et. al.Mohammad Ghavamzadeh ... Rajbala Makar
04 Apr 2006
Autonomous Agents and Multi-Agent Systems | VOL. 13

Graph-Based Design of Hierarchical Reinforcement Learning Agents
Davide Tateo ... Andrea Bonarini
-
Davide Tateo, et. al.Davide Tateo ... Andrea Bonarini
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchical Reinforcement Learning for Open-Domain Dialog

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence