Abstract

Discourse representation structures (DRSs) are scoped semantic representations for texts of arbitrary length. Evaluating the accuracy of predicted DRSs plays a key role in developing semantic parsers and improving their performance. DRSs are typically visualized as boxes which are not straightforward to process automatically. Counter transforms DRSs to clauses and measures clause overlap by searching for variable mappings between two DRSs. However, this metric is computationally costly (with respect to memory and CPU time) and does not scale with longer texts. We introduce Dscorer, an efficient new metric which converts box-style DRSs to graphs and then measures the overlap of n-grams. Experiments show that Dscorer computes accuracy scores that are correlated with Counter at a fraction of the time.

Highlights

  • Discourse Representation Theory (DRT) is a popular theory of meaning representation (Kamp, 1981; Kamp and Reyle, 2013; Asher, 1993; Asher et al, 2003) designed to account for a variety of linguistic phenomena within and across sentences

  • DSCORER, converts box-style Discourse Representation Structures (DRSs) to a graph format used for evaluation and computes F1 with high efficiency (7,000 times faster compared to COUNTER)

  • Due to the high sparsity of graphs typical for DRSs, the number of n-grams does not explode as the size of graphs increases, |G| = |N | + |E|, where |N | and |E| are the number of nodes and edges in graph G, respectively

Read more

Summary

Introduction

Discourse Representation Theory (DRT) is a popular theory of meaning representation (Kamp, 1981; Kamp and Reyle, 2013; Asher, 1993; Asher et al, 2003) designed to account for a variety of linguistic phenomena within and across sentences. This incurs unacceptable runtime, especially when evaluating document-level DRSs with a large number of variables Another problem with the current evaluation is that COUNTER only considers local clauses without taking larger window sizes into account. It considers “b4 sing e2” and “b3 NOT b4” as separate semantic units. In DRS parsing, there is a single correct semantic representation (goldstandard reference) and no limit on the maximum size of DRSs. Our proposed metric, DSCORER, converts box-style DRSs to a graph format used for evaluation and computes F1 with high efficiency (7,000 times faster compared to COUNTER). We release our code, implementing the metric, at https: //github.com/LeonCrashCode/DRSScorer

Graph Induction
Evaluation Based on n-grams
Experiments
Metric Settings
Analysis
Case Study
Related Work
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.