Abstract

Željko Agić, Jörg Tiedemann, Danijela Merkler, Simon Krek, Kaja Dobrovoljc, Sara Može. Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants. 2014.

Highlights

  • A large majority of human languages are underresourced in terms of text corpora and tools available for applications in natural language processing (NLP)

  • We make use of the publicly available language resources for Croatian, Serbian and Slovene. These include dependency treebanks, test sets annotated for morphology and dependency syntax, and a morphosyntactic feature representation drawing from the Multext East project (Erjavec, 2012)

  • The first character of the tag denotes the part of speech (POS), while each of the following characters encodes a specific attribute in a specific position. Both the positions and the attributes are languagedependent in Multext East version 4 (MTE 4), but the attributes are still largely shared between these three languages due to their relatedness

Read more

Summary

Introduction

A large majority of human languages are underresourced in terms of text corpora and tools available for applications in natural language processing (NLP). We focus on dependency parsing (Kubler et al, 2009), but the claims should hold in general. The lack of dependency treebanks is due to the fact that they are expensive and time-consuming to construct (Abeille, 2003). Since dependency parsing of under-resourced languages draws substantial interest in the NLP research community, over time, we have seen a number of research efforts directed towards their processing despite the absence of training data for supervised learning of parsing models. We focus on supervised learning of dependency parsers, as the performance of unsupervised approaches still falls far behind the state of the art in supervised parser induction

Related Work
Paper Overview
Resources
Morphosyntactic Tagset
Test Sets
Workflow
Dependency Parsing
Treebank Translation and Annotation Projection
Results and Discussion
Monolingual Parsing
Direct Cross-lingual Parsing
Cross-lingual Parsing with Treebank Translation
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.