Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Bing Liu,Ian Lane,Tong Yu,Ole Mengshoel

doi:10.1609/aaai.v32i1.12028

Abstract

Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with a nonlinear reward function that uses distributed representation of text for online response selection. A bidirectional LSTM is used to produce the distributed representations of dialog context and responses, which serve as the input to a contextual bandit. In learning the bandit, we propose a customized Thompson sampling method that is applied to a polynomial feature space in approximating the reward. Experimental results on the Ubuntu Dialogue Corpus demonstrate significant performance gains of the proposed method over conventional linear contextual bandits. Moreover, we report encouraging response selection performance of the proposed neural bandit model using the Recall@k metric for a small set of online training samples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 27, 2018
Citations: 4

Similar Papers

Graph and Neural Network-Based Intelligent Conversation System
Anuja Arora ... Aman Srivastava
-
Anuja Arora, et. al.Anuja Arora ... Aman Srivastava
01 Jan 2019
01 Jan 2019

A Hybrid RNN-CNN Encoder for Neural Conversation Model
Zhiyuan Ma ... Wenge Rong
-
Zhiyuan Ma, et. al.Zhiyuan Ma ... Wenge Rong
01 Jan 2018
01 Jan 2018

Identifying Untrustworthy Samples
Lei Shen ... Xin Shen
-
Lei Shen, et. al.Lei Shen ... Xin Shen
26 Oct 2021
26 Oct 2021

Neural Conversation Model Controllable by Given Dialogue Act Based on Adversarial Learning and Label-aware Objective
Seiya Kawano ... Satoshi Nakamura
-
Seiya Kawano, et. al.Seiya Kawano ... Satoshi Nakamura
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence