On Incorporating Structural Information to improve Dialogue Response Generation

Nikita Moghe,Balaraman Ravindran,Mitesh M Khapra,Priyesh Vijayan

doi:10.18653/v1/2020.nlp4convai-1.2

Abstract

We consider the task of generating dialogue responses from background knowledge comprising of domain specific resources. Specifically, given a conversation around a movie, the task is to generate the next response based on background knowledge about the movie such as the plot, review, Reddit comments etc. This requires capturing structural, sequential and semantic information from the conversation context and the background resources. We propose a new architecture that uses the ability of BERT to capture deep contextualized representations in conjunction with explicit structure and sequence information. More specifically, we use (i) Graph Convolutional Networks (GCNs) to capture structural information, (ii) LSTMs to capture sequential information and (iii) BERT for the deep contextualized representations that capture semantic information. We analyze the proposed architecture extensively. To this end, we propose a plug-and-play Semantics-Sequences-Structures (SSS) framework which allows us to effectively combine such linguistic information. Through a series of experiments we make some interesting observations. First, we observe that the popular adaptation of the GCN model for NLP tasks where structural information (GCNs) was added on top of sequential information (LSTMs) performs poorly on our task. This leads us to explore interesting ways of combining semantic and structural information to improve the performance. Second, we observe that while BERT already outperforms other deep contextualized representations such as ELMo, it still benefits from the additional structural information explicitly added using GCNs. This is a bit surprising given the recent claims that BERT already captures structural information. Lastly, the proposed SSS framework gives an improvement of 7.95% on BLUE score over the baseline.

Highlights

We consider the task of generating dialogue responses from background knowledge comprising of domain specific resources
These representations h1, h2, h3, . . . , hm are fed to the M-Graph Convolutional Networks (GCNs) along with the graph G to compute a k-hop aggregated representation as shown below: hsi tr = M -GCN. This final representation hfi inal = hsi tr for the i-th word combines semantics, sequential and structural information in that order. This is a popular way of combining GCNs with LSTMs but our experiments suggest that this does not work well for our task
We report the performance of the BiRNN + GCN architecture that uses the dependency graph only as discussed in (Marcheggiani and Titov, 2017)

Summary

Introduction

We consider the task of generating dialogue responses from background knowledge comprising of domain specific resources. Given a conversation around a movie, the task is to generate the response based on background knowledge about the movie such as the plot, review, Reddit comments etc This requires capturing structural, sequential, and semantic information from the conversation context and background resources. We observe that the popular adaptation of the GCN model for NLP tasks where structural information (GCNs) was added on top of sequential information (LSTMs) performs poorly on our task This leads us to explore interesting ways of combining semantic and structural information to improve performance. The Syntactic-GCN proposed in (Marcheggiani and Titov, 2017) is a GCN (Kipf and Welling, 2017) variant which can model multiple edge types and edge directions It can dynamically determine the importance of an edge.

Methods

Results

Conclusion