Abstract

Implicit relation classification on Penn Discourse TreeBank (PDTB) 2.0 is a common benchmark task for evaluating the understanding of discourse relations. However, the lack of consistency in preprocessing and evaluation poses challenges to fair comparison of results in the literature. In this work, we highlight these inconsistencies and propose an improved evaluation protocol. Paired with this protocol, we report strong baseline results from pretrained sentence encoders, which set the new state-of-the-art for PDTB 2.0. Furthermore, this work is the first to explore fine-grained relation classification on PDTB 3.0. We expect our work to serve as a point of comparison for future work, and also as an initiative to discuss models of larger context and possible data augmentations for downstream transferability.

Highlights

  • Understanding discourse relations in natural language text is crucial to end tasks involving larger context, such as question-answering (Jansen et al, 2014) and conversational systems grounded on documents (Saeidi et al, 2018; Feng et al, 2020)

  • One way to characterize discourse is through relations between two spans or arguments (ARG1/ARG2) as in the Penn Discourse TreeBank (PDTB) (Prasad et al, 2008, 2019)

  • We present a set of strong baselines from pretrained sentence encoders on both PDTB 2.0 and 3.0 that set the state-of-the-art

Read more

Summary

Introduction

Understanding discourse relations in natural language text is crucial to end tasks involving larger context, such as question-answering (Jansen et al, 2014) and conversational systems grounded on documents (Saeidi et al, 2018; Feng et al, 2020). We highlight preprocessing and evaluation inconsistencies in works using PDTB 2.0 for implicit discourse relation classification. We report state-of-the-art results on both toplevel and second-level implicit discourse relation classification on PDTB 2.0, and the first set of results on PDTB 3.0. We expect these results to serve as simple but strong baselines that motivate future work. In PDTB, two text spans in a discourse relation are labeled with either one or two senses from a three-level sense hierarchy. The new version of the dataset, PDTB 3.0 (Prasad et al, 2019), introduces a new annotation scheme with a revised sense hierarchy as well as 13K additional datapoints. The third-level in the sense hierarchy is modified to only contain asymmetric (or directional) senses

Variation in preprocessing and evaluation
Proposed Evaluation Protocol
Baseline results
Single-span baselines
Discussion: where should we go next?
Conclusion
A Dataset Statistics
B List of Splits in Prior Work
C Training Details
D Top-level Sense Classification Results
E Single-span Baselines for L2 Classification
F Cross-validation and Randomized validation
Findings
G Additional Error Analyses

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.