Abstract

Previous researches on Text-level discourse parsing mainly made use of constituency structure to parse the whole document into one discourse tree. In this paper, we present the limitations of constituency based discourse parsing and first propose to use dependency structure to directly represent the relations between elementary discourse units (EDUs). The state-of-the-art dependency parsing techniques, the Eisner algorithm and maximum spanning tree (MST) algorithm, are adopted to parse an optimal discourse dependency tree based on the arcfactored model and the large-margin learning techniques. Experiments show that our discourse dependency parsers achieve a competitive performance on text-level discourse parsing.

Highlights

  • It is widely agreed that no units of the text can be understood in isolation, but in relation to their context

  • The rhetorical relations in Rhetorical Structure Theory (RST) trees are kept as the functional relations which link the two Elementary Discourse Units (EDUs) in dependency trees

  • Following (Feng and Hirst, 2012; Lin et al, 2009; Hernault et al, 2010b), we explore the following 6 feature types combined with relations to represent each labeled arc . (1) WORD: The first one word, the last one word, and the first bigrams in each EDU, the pair of the two first words and the pair of the two last words in the two EDUs are extracted as features

Read more

Summary

Introduction

It is widely agreed that no units of the text can be understood in isolation, but in relation to their context. The leaves of a tree correspond to contiguous text spans called Elementary Discourse Units (EDUs). The different levels of discourse units (e.g. EDUs or larger text spans) occurring in the generative process are better represented with different features, and a uniform framework for discourse analysis is hard to develop. We adopt the graph based dependency parsing techniques learned from large sets of annotated dependency trees. The Eisner (1996) algorithm and maximum spanning tree (MST) algorithm are used respectively to parse the optimal projective and non-projective dependency trees with the large-margin learning technique (Crammer and Singer, 2003).

Discourse Dependency Structure
Our Discourse Dependency Treebank
System Overview
Eisner Algorithm
Maximum Spanning Tree Algorithm
Learning
Features
MIRA based Learning
Preparation
Feature Influence on Two Relation Sets
Method Features
Comparison with Other Systems
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.