A Refined End-to-End Discourse Parser

Jianxiang Wang,Man Lan

doi:10.18653/v1/k15-2002

Abstract

The CoNLL-2015 shared task focuses on shallow discourse parsing, which takes a piece of newswire text as input and returns the discourse relations in a PDTB style. In this paper, we describe our discourse parser that participated in the shared task. We use 9 components to construct the whole parser to identify discourse connectives, label arguments and classify the sense of Explicit or Non-Explicit relations in free texts. Compared to previous discourse parser, new components and features are added in our system, which further improves the overall performance of the discourse parser. Our parser ranks the first on two test datasets, i.e., PDTB Section 23 and a blind test dataset.

Highlights

An end-to-end discourse parser is given free texts as input and returns discourse relations in a Penn Discourse Treebank (PDTB) style, where a connective acts as a predicate that takes two text spans as its arguments
We find that the F1 scores of all these classifiers are increased by adding our new features (+new)
We see that the F1 of PS is improved by a large margin for Arg1, Arg2 and Both by using two separate PS argument extractors, and the overall F1 of Explicit arguments extraction is increased by 2.51%

Summary

Introduction

An end-to-end discourse parser is given free texts as input and returns discourse relations in a PDTB style, where a connective acts as a predicate that takes two text spans as its arguments. It can benefit many downstream NLP applications, such as information retrieval, question answering and automatic summarization, etc. To identify discourse connectives from non-discourse ones and to classify the Explicit relations, (Pitler and Nenkova, 2009) extracted syntactic features of connectives from the constituent parses, and showed that syntactic features improved performance in both subtasks. As for Implicit sense classification, (Pitler et al, 2009), (Lin et al, 2009) and (Rutherford and Xue, 2014) performed the classification using several linguistically-informed features, such as verb classes, production rules and Brown cluster pair. (Lan et al, 2013) presented a multi-task learning framework with the use of the prediction of explicit discourse connective as auxiliary learning tasks to improve the performance

Methods

Results

Conclusion