Abstract

This paper describes our dependency parsing system in CoNLL-2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. We primarily focus on the low-resource languages (surprise languages). We have developed a framework to combine multiple treebanks to train parsers for low resource languages by delexicalization method. We have applied transformation on source language treebanks based on syntactic features of the low-resource language to improve performance of the parser. In the official evaluation, our system achieves an macro-averaged LAS score of 67.61 and 37.16 on the entire blind test data and the surprise language test data respectively.

Highlights

  • A dependency parser analyzes the relations among the words in a sentence to determine the syntactic dependencies among them where the dependency relations are drawn from a fixed set of grammatical relations

  • There has been a lot of focus recently on development of dependency parsers for low-resource languages i.e., the languages for which little or no treebanks are available by cross-lingual transfer parsing methods using knowledge derived from treebanks of other languages and the resources available for the low-resource languages (McDonald et al, 2011; Tiedemann, 2015; McDonald et al, 2011; Zeman and Resnik, 2008; Rasooli and Collins, 2015)

  • The sentences are transformed by traversing the trees according to the ordering of the dependencies in the target language (TL) e.g., the subtrees corresponding to the modifiers in the pre-modifier list and the modifiers in the othermodifier list that appear before the current word in the source language (SL) sentence are traversed first, the word of the current node is added to the transformed word list, followed by traversal of the subtrees corresponding to the modifiers in the postmodifier list and the words in other-modifiers list that appear after the current word in the SL sentence

Read more

Summary

Introduction

A dependency parser analyzes the relations among the words in a sentence to determine the syntactic dependencies among them where the dependency relations are drawn from a fixed set of grammatical relations. The Universal Dependencies (http: //universaldependencies.org/) (Nivre et al, 2016) project has enabled the development of consistent treebanks for several languages using an uniform PoS, morphological features and dependency relation tagging scheme. This has immensely helped research in multi-lingual parsing, cross-lingual transfer parsing and the comparison of language structures over several languages. The CONLL 2017 shared task focusses on learning syntactic parsers starting from raw text that can work over several typologically different languages and even surprise languages for which no training data is available using the common annotation scheme (UD v2).

Corpus and resources
System description
Surprise language
Syntactic feature based transformation
Transformation features
Tree-traversal based transformation algorithm
Steps for training the parser for surprise languages
Experiments and results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.