Unlexicalized Transition-based Discontinuous Constituency Parsing

Maximin Coavoux,Benoît Crabbé,Shay B Cohen

doi:10.1162/tacl_a_00255

Abstract

AbstractLexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head and (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based parser for discontinuous constituency structures, based on a structure-label transition system and a bi-LSTM scoring system. We compare it with lexicalized parsing models in order to address the question of lexicalization in the context of discontinuous constituency parsing. Our experiments show that unlexicalized models systematically achieve higher results than lexicalized models, and provide additional empirical evidence that lexicalization is not necessary to achieve strong parsing results. Our best unlexicalized model sets a new state of the art on English and German discontinuous constituency treebanks. We further provide a per-phenomenon analysis of its errors on discontinuous constituents.

Highlights

This paper introduces an unlexicalized parsing model and addresses the question of lexicalization, as a parser design choice, in the context of transition-based discontinuous constituency parsing
In a lexicalized Probabilistic ContextFree Grammar (PCFG), grammar rules involve nonterminals annotated with a terminal element that represents their lexical head, for example: VP[saw] −→ VP[saw] PP[telescope]
Bikel (2004) showed that bilexical statistics were rarely used during decoding, and that when used, they were close to that of backoff distributions used for unknown word pairs

Summary

Introduction

This paper introduces an unlexicalized parsing model and addresses the question of lexicalization, as a parser design choice, in the context of transition-based discontinuous constituency parsing. Lexicalized parsing models (Collins, 1997; Charniak, 1997) are based on the assumptions that (i) constituents are organized around a lexical head and (ii) bilexical statistics are crucial to solve ambiguities. In a lexicalized Probabilistic ContextFree Grammar (PCFG), grammar rules involve nonterminals annotated with a terminal element that represents their lexical head, for example: VP[saw] −→ VP[saw] PP[telescope]. The probability of such a rule models the likelihood that telescope is a suitable modifier for saw. Bikel (2004) showed that bilexical statistics were rarely used during decoding, and that when used, they were close to that of backoff distributions used for unknown word pairs

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Apr 1, 2019
Citations: 56	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Unlexicalized Transition-based Discontinuous Constituency Parsing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Discontinuous Constituent Parsing with Pointer Networks
Daniel Fernández-González ... Carlos Gómez-Rodríguez
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 34
Daniel Fernández-González, et. al.Daniel Fernández-González ... Carlos Gómez-Rodríguez
03 Apr 2020
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence | VOL. 34

Computing the Most Probable Parse for a Discontinuous Phrase Structure Grammar
Oliver Plaehn
-
Oliver PlaehnOliver Plaehn
01 Jan 2004
01 Jan 2004

Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle
Maximin Coavoux ... Shay B Cohen
-
Maximin Coavoux, et. al.Maximin Coavoux ... Shay B Cohen
01 Jan 2019
01 Jan 2019

Natural Language Processing and Computational Linguistics
Junichi Tsujii
Computational Linguistics | VOL. -
Junichi TsujiiJunichi Tsujii
07 Dec 2021
Computational Linguistics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unlexicalized Transition-based Discontinuous Constituency Parsing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics