Abstract

Prague Tectogrammatical Graphs (PTG) is a meaning representation framework that originates in the tectogrammatical layer of the Prague Dependency Treebank (PDT) and is theoretically founded in Functional Generative Description of language (FGD). PTG in its present form has been prepared for the CoNLL 2020 shared task on Cross-Framework Meaning Representation Parsing (MRP). It is generated automatically from the Prague treebanks and stored in the JSON-based MRP graph interchange format. The conversion is partially lossy; in this paper we describe what part of annotation was included and how it is represented in PTG.

Highlights

  • The Functional Generative Description (FGD) (Sgall, 1967; Sgall et al, 1986), as instantiated in the Prague family of dependency treebanks, defines four layers of description: 1. the word layer; 2. the morphological layer; 3. the analytical layer; 4. the tectogrammatical layer

  • The meaning representation used in the CoNLL 2020 shared task (Oepen et al, 2020) is based mostly on the tectogrammatical layer; references have to be followed all the way down to the word layer in order to provide anchoring of graph nodes in the underlying text

  • The English data was taken from the same sources as in the previous shared task (CoNLL Meaning Representation Parsing (MRP) 2019, Oepen et al 2019); a different conversion procedure had been used in the previous task, leading to different target graphs, known as Prague Semantic Dependencies (PSD, Miyao et al 2014)

Read more

Summary

Introduction

The Functional Generative Description (FGD) (Sgall, 1967; Sgall et al, 1986), as instantiated in the Prague family of dependency treebanks, defines four layers of description: 1. the word layer; 2. the morphological layer; 3. the analytical (surfacesyntactic) layer; 4. the tectogrammatical (deepsyntactic) layer. The meaning representation used in the CoNLL 2020 shared task (Oepen et al, 2020) is based mostly on the tectogrammatical layer; references have to be followed all the way down to the word layer in order to provide anchoring of graph nodes in the underlying text. The shared task featured PTG data in two languages: English and Czech. As there are other frameworks in which the same data is annotated in the shared task, the training-development-test split was synchronized across frameworks, and a handful of sentences were omitted because they did not align with the original WSJ text. As the representations in the shared task are not restricted to trees, additional edges were added to more directly encode paratactic structures and coreference.

Graph Properties and Anchoring
Edge Types
Generated Nodes
Coreference
Node Properties
Other Crops
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call