Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots

Yu Wu,Ming Zhou,Zhoujun Li,Wei Wu

doi:10.18653/v1/p18-2067

Abstract

We propose a method that can leverage unlabeled data to learn a matching model for response selection in retrieval-based chatbots. The method employs a sequence-to-sequence architecture (Seq2Seq) model as a weak annotator to judge the matching degree of unlabeled pairs, and then performs learning with both the weak signals and the unlabeled data. Experimental results on two public data sets indicate that matching models get significant improvements when they are learned with the proposed method.

Highlights

More and more attention from both academia and industry is paying to building nontask-oriented chatbots that can naturally converse with humans on any open domain topics
Existing approaches can be categorized into generationbased methods (Shang et al, 2015; Vinyals and Le, 2015; Serban et al, 2016; Sordoni et al, 2015; Xing et al, 2017; Serban et al, 2017; Xing et al, 2018) which synthesize a response with natural language generation techniques, and retrievalbased methods (Hu et al, 2014; Lowe et al, 2015; Yan et al, 2016; Zhou et al, 2016; Wu et al, 2017) which select a response from a pre-built index
We conduct experiments on two public data sets, and experimental results on both data sets indicate that models learned with our method can significantly outperform their counterparts learned with the random sampling strategy

Summary

Introduction

More and more attention from both academia and industry is paying to building nontask-oriented chatbots that can naturally converse with humans on any open domain topics. While existing research focuses on how to define a matching model with neural networks, little attention has been paid to how to learn such a model when few labeled data are available. A common practice is to transform the matching problem to a classification problem with human responses as positive examples and randomly sampled ones as negative examples. This strategy, oversimplifies the learning problem, as most of the randomly sampled responses are either far from the semantics of the messages or the contexts, or they are false negatives which pollute the training data as noise. There often exists a significant gap between the performance of a model in training and the same model in practice (Wang et al, 2015; Wu et al, 2017).

Objectives

Results

Discussion

Conclusion