FGD at MRP 2020: Prague Tectogrammatical Graphs

Daniel Zeman,Jan Hajic

doi:10.18653/v1/2020.conll-shared.3

Abstract

Prague Tectogrammatical Graphs (PTG) is a meaning representation framework that originates in the tectogrammatical layer of the Prague Dependency Treebank (PDT) and is theoretically founded in Functional Generative Description of language (FGD). PTG in its present form has been prepared for the CoNLL 2020 shared task on Cross-Framework Meaning Representation Parsing (MRP). It is generated automatically from the Prague treebanks and stored in the JSON-based MRP graph interchange format. The conversion is partially lossy; in this paper we describe what part of annotation was included and how it is represented in PTG.

Highlights

The Functional Generative Description (FGD) (Sgall, 1967; Sgall et al, 1986), as instantiated in the Prague family of dependency treebanks, defines four layers of description: 1. the word layer; 2. the morphological layer; 3. the analytical layer; 4. the tectogrammatical layer
The meaning representation used in the CoNLL 2020 shared task (Oepen et al, 2020) is based mostly on the tectogrammatical layer; references have to be followed all the way down to the word layer in order to provide anchoring of graph nodes in the underlying text
The English data was taken from the same sources as in the previous shared task (CoNLL Meaning Representation Parsing (MRP) 2019, Oepen et al 2019); a different conversion procedure had been used in the previous task, leading to different target graphs, known as Prague Semantic Dependencies (PSD, Miyao et al 2014)

Summary

Introduction

The Functional Generative Description (FGD) (Sgall, 1967; Sgall et al, 1986), as instantiated in the Prague family of dependency treebanks, defines four layers of description: 1. the word layer; 2. the morphological layer; 3. the analytical (surfacesyntactic) layer; 4. the tectogrammatical (deepsyntactic) layer. The meaning representation used in the CoNLL 2020 shared task (Oepen et al, 2020) is based mostly on the tectogrammatical layer; references have to be followed all the way down to the word layer in order to provide anchoring of graph nodes in the underlying text. The shared task featured PTG data in two languages: English and Czech. As there are other frameworks in which the same data is annotated in the shared task, the training-development-test split was synchronized across frameworks, and a handful of sentences were omitted because they did not align with the original WSJ text. As the representations in the shared task are not restricted to trees, additional edges were added to more directly encode paratactic structures and coreference.

Graph Properties and Anchoring

Edge Types

Generated Nodes

Coreference

Node Properties

Other Crops