Abstract

The paper describes the construction of a Bulgarian-English treebank aligned on the word and semantic level. We consider the manual word level alignment easier and more reliable than the manual alignment on syntactic and semantic level. Thus, after manual word level alignment we apply an automatic procedure for the construction of semantic level alignments. Our work presents the main steps of this automatic procedure which exploits the syntactic analysis of both sentences, morphosyntactic annotation, manual word level alignment in producing the semantic annotation of the sentences and semantic alignment. Last, but not least, a method for identification of potential errors is discussed using the automatically constructed semantic analyses of Bulgarian sentences and their comparison to the semantic representation of the English sentences.

Highlights

  • In this paper we report on the design of the annotation schema of the Bulgarian-English Parallel Treebank (BulEngTreebank) and semiautomatic error correction of the automatic dependency analysis of the Bulgarian sentences

  • In this paper we present the levels of the annotation of the treebank, the type of rules for construction of MRS structures over dependency parses and the procedure for error detection for manual editing of the dependency analyses of Bulgarian sentences

  • The semantic level annotation is automatically constructed via an HPSG English grammar and a hybrid architecture for Bulgarian — an HPSG Bulgarian grammar or a dependency parser

Read more

Summary

Introduction

In this paper we report on the design of the annotation schema of the Bulgarian-English Parallel Treebank (BulEngTreebank) and semiautomatic error correction of the automatic dependency analysis of the Bulgarian sentences. The annotation procedure is as follows: first, the Bulgarian sentences are parsed with BURGER If it succeeds, the resulting MRS structures are used for the alignment. Our belief is that by having alignments on the word level, syntactic analyses and the rules for composition of MRS structures, we will be able to determine correspondences between bigger MRS structures than only at the lexical level, using the ideas of Tinsley et al (2009). In this paper we present the levels of the annotation of the treebank, the type of rules for construction of MRS structures over dependency parses and the procedure for error detection for manual editing of the dependency analyses of Bulgarian sentences

Background and Related Work
Word Level Alignment
Bulgarian Dependency Parsing and RMRS Analysis
Semantic Level Alignment
Error Detection
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call