Abstract
The paper describes the construction of a Bulgarian-English treebank aligned on the word and semantic level. We consider the manual word level alignment easier and more reliable than the manual alignment on syntactic and semantic level. Thus, after manual word level alignment we apply an automatic procedure for the construction of semantic level alignments. Our work presents the main steps of this automatic procedure which exploits the syntactic analysis of both sentences, morphosyntactic annotation, manual word level alignment in producing the semantic annotation of the sentences and semantic alignment. Last, but not least, a method for identification of potential errors is discussed using the automatically constructed semantic analyses of Bulgarian sentences and their comparison to the semantic representation of the English sentences.
Highlights
In this paper we report on the design of the annotation schema of the Bulgarian-English Parallel Treebank (BulEngTreebank) and semiautomatic error correction of the automatic dependency analysis of the Bulgarian sentences
In this paper we present the levels of the annotation of the treebank, the type of rules for construction of MRS structures over dependency parses and the procedure for error detection for manual editing of the dependency analyses of Bulgarian sentences
The semantic level annotation is automatically constructed via an HPSG English grammar and a hybrid architecture for Bulgarian — an HPSG Bulgarian grammar or a dependency parser
Summary
In this paper we report on the design of the annotation schema of the Bulgarian-English Parallel Treebank (BulEngTreebank) and semiautomatic error correction of the automatic dependency analysis of the Bulgarian sentences. The annotation procedure is as follows: first, the Bulgarian sentences are parsed with BURGER If it succeeds, the resulting MRS structures are used for the alignment. Our belief is that by having alignments on the word level, syntactic analyses and the rules for composition of MRS structures, we will be able to determine correspondences between bigger MRS structures than only at the lexical level, using the ideas of Tinsley et al (2009). In this paper we present the levels of the annotation of the treebank, the type of rules for construction of MRS structures over dependency parses and the procedure for error detection for manual editing of the dependency analyses of Bulgarian sentences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have