Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Jakob Prange,Vivek Srikumar,Nathan Schneider

doi:10.1162/tacl_a_00364

Abstract

AbstractAlthough current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories’ internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.

Highlights

Combinatory Categorial Grammar (CCG; Steedman, 2000) is a strongly-lexicalized grammar formalism in which rich syntactic categories at the lexical level impose tight constraints on the constituents that can be formed
To test how well the models really generalize to the long tail, we evaluate them on alternatively sampled training and evaluation splits of the WSJ data (Table 6) as well as in domains diverging from the WSJ training
We find that the supertags with the longest dependencies on average largely are functioning as subordinators, sentence adverbials, and inverted speech verbs such as (S[dcl]/S[dcl])/noun phrase (NP)

Summary

Introduction

Combinatory Categorial Grammar (CCG; Steedman, 2000) is a strongly-lexicalized grammar formalism in which rich syntactic categories at the lexical level impose tight constraints on the constituents that can be formed. Most CCG parsers operate as a pipeline whose first task is ‘supertagging’, i.e., sequence labeling with a large search space of complex ‘supertags’ (Clark and Curran, 2004; Xu et al, 2015; Vaswani et al, 2016, inter alia). All that remains to parsing is applying general rules of (binary) combination between adjacent constituents until the entire input is covered. In contrast to the simpler task of part-of-speech tagging, supertaggers are required to resolve most of the syntactic ambiguity in the input

Objectives

Methods

Results

Discussion

Conclusion