Delexicalized transfer parsing for low-resource languages using transformed and combined treebanks

Ayan Das,Sudeshna Sarkar,Affan Zaffar

doi:10.18653/v1/k17-3019

Abstract

This paper describes our dependency parsing system in CoNLL-2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. We primarily focus on the low-resource languages (surprise languages). We have developed a framework to combine multiple treebanks to train parsers for low resource languages by delexicalization method. We have applied transformation on source language treebanks based on syntactic features of the low-resource language to improve performance of the parser. In the official evaluation, our system achieves an macro-averaged LAS score of 67.61 and 37.16 on the entire blind test data and the surprise language test data respectively.

Highlights

A dependency parser analyzes the relations among the words in a sentence to determine the syntactic dependencies among them where the dependency relations are drawn from a fixed set of grammatical relations
There has been a lot of focus recently on development of dependency parsers for low-resource languages i.e., the languages for which little or no treebanks are available by cross-lingual transfer parsing methods using knowledge derived from treebanks of other languages and the resources available for the low-resource languages (McDonald et al, 2011; Tiedemann, 2015; McDonald et al, 2011; Zeman and Resnik, 2008; Rasooli and Collins, 2015)
The sentences are transformed by traversing the trees according to the ordering of the dependencies in the target language (TL) e.g., the subtrees corresponding to the modifiers in the pre-modifier list and the modifiers in the othermodifier list that appear before the current word in the source language (SL) sentence are traversed first, the word of the current node is added to the transformed word list, followed by traversal of the subtrees corresponding to the modifiers in the postmodifier list and the words in other-modifiers list that appear after the current word in the SL sentence

Summary

Introduction

A dependency parser analyzes the relations among the words in a sentence to determine the syntactic dependencies among them where the dependency relations are drawn from a fixed set of grammatical relations. The Universal Dependencies (http: //universaldependencies.org/) (Nivre et al, 2016) project has enabled the development of consistent treebanks for several languages using an uniform PoS, morphological features and dependency relation tagging scheme. This has immensely helped research in multi-lingual parsing, cross-lingual transfer parsing and the comparison of language structures over several languages. The CONLL 2017 shared task focusses on learning syntactic parsers starting from raw text that can work over several typologically different languages and even surprise languages for which no training data is available using the common annotation scheme (UD v2).

Corpus and resources

System description

Surprise language

Syntactic feature based transformation

Transformation features

Tree-traversal based transformation algorithm

Steps for training the parser for surprise languages

Experiments and results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Delexicalized transfer parsing for low-resource languages using transformed and combined treebanks

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2017
Citations: 9	License type: cc-by

Similar Papers

Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation
Mieradilijiang Maimaiti ... Huanbo Luan
Tsinghua Science and Technology | VOL. 27
Mieradilijiang Maimaiti, et. al.Mieradilijiang Maimaiti ... Huanbo Luan
01 Feb 2022
Tsinghua Science and Technology | VOL. 27

Multilingual Dependency Parsing for Low-Resource African Languages: Case Studies on Bambara, Wolof, and Yoruba
Cheikh M Bamba Dione
-
Cheikh M Bamba DioneCheikh M Bamba Dione
01 Jan 2020
01 Jan 2020

NMT for a Low Resource Language Bodo: Preprocessing and Resource Modelling
Simanta Kalita ... Parvez Aziz Boruah
-
Simanta Kalita, et. al.Simanta Kalita ... Parvez Aziz Boruah
16 Mar 2023
16 Mar 2023

Effectiveness of fractal dimension for ASR in low resource language
Mohammadi Zaki ... Hemant A Patil
-
Mohammadi Zaki, et. al.Mohammadi Zaki ... Hemant A Patil
01 Sep 2014
01 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Delexicalized transfer parsing for low-resource languages using transformed and combined treebanks

Abstract

Highlights

Summary

Talk to us

Similar Papers