Discourse Relation Prediction: Revisiting Word Pairs with Convolutional Networks

Siddharth Varia,Tuhin Chakrabarty,Christopher Hidey

doi:10.18653/v1/w19-5951

Abstract

Word pairs across argument spans have been shown to be effective for predicting the discourse relation between them. We propose an approach to distill knowledge from word pairs for discourse relation classification with convolutional neural networks by incorporating joint learning of implicit and explicit relations. Our novel approach of representing the input as word pairs achieves state-of-the-art results on four-way classification of both implicit and explicit relations as well as one of the binary classification tasks. For explicit relation prediction, we achieve around 20% error reduction on the four-way task. At the same time, compared to a two-layered Bi-LSTM-CRF model, our model is able to achieve these results with half the number of learnable parameters and approximately half the amount of training time.

Highlights

Implicit discourse relation identification is the task of recognizing the relationship between text segments without the use of an explicit connective indicating the relationship
We compare our results to previous work along two dimensions: the architecture of the model (CNN or LSTM) and whether the model employs a joint learning component
We proposed an approach to learn implicit relations by incorporating word pair features as a novel way to capture the interaction between the arguments, a distinct approach compared to the popular attention-based approaches used with BiLSTM based models

Summary

Introduction

Implicit discourse relation identification is the task of recognizing the relationship between text segments without the use of an explicit connective indicating the relationship. Automatically identifying the relationship is much more difficult. Improvement in identifying implicit discourse relations will improve performance in downstream tasks such as question answering, textual inference (for determining relationships between text segments), machine translation and other multi-lingual tasks (for transferring discourse information between languages). The Penn Discourse Tree Bank (PDTB) theory of discourse relations (Prasad et al, 2008) defines a shallow discourse representation between adjacent or nearby segments. The span of the arguments participating in the discourse relation is often the most important input to a classifier

Methods

Results

Discussion

Conclusion