Cross-lingual Parsing with Polyglot Training and Multi-treebank Learning: A Faroese Case Study

James Barry,Joachim Wagner,Jennifer Foster

doi:10.18653/v1/d19-6118

Abstract

Cross-lingual dependency parsing involves transferring syntactic knowledge from one language to another. It is a crucial component for inducing dependency parsers in low-resource scenarios where no training data for a language exists. Using Faroese as the target language, we compare two approaches using annotation projection: first, projecting from multiple monolingual source models; second, projecting from a single polyglot model which is trained on the combination of all source languages. Furthermore, we reproduce multi-source projection (Tyers et al., 2018), in which dependency trees of multiple sources are combined. Finally, we apply multi-treebank modelling to the projected treebanks, in addition to or alternatively to polyglot modelling on the source side. We find that polyglot training on the source languages produces an overall trend of better results on the target language but the single best result for the target language is obtained by projecting from monolingual source parsing models and then training multi-treebank POS tagging and parsing models on the target side.

Highlights

Cross-lingual transfer methods, i. e. methods that transfer knowledge from one or more source languages to a target language, have led to substantial improvements for low-resource dependency parsing (Rosa and Marecek, 2018; Agicet al., 2016; Guo et al, 2015; Lynn et al, 2014; McDonald et al, 2011; Hwa et al, 2005) and part-ofspeech (POS) tagging (Plank and Agic, 2018)
Inspired by recent literature involving multilingual learning (Mulcaire et al, 2019; Smith et al, 2018; Vilares et al, 2016), we investigate whether additional improvements can be made by: 1. using a single polyglot2 parsing model which is trained on the combination of all source languages to create synthetic source treebanks
We aim to investigate whether the current state-of-the-art approach for Faroese, which relies on cross-lingual transfer, can be improved upon by adopting an approach based on source-side polyglot learning and/or target-side multi-treebank learning

Summary

Introduction

Cross-lingual transfer methods, i. e. methods that transfer knowledge from one or more source languages to a target language, have led to substantial improvements for low-resource dependency parsing (Rosa and Marecek, 2018; Agicet al., 2016; Guo et al, 2015; Lynn et al, 2014; McDonald et al, 2011; Hwa et al, 2005) and part-ofspeech (POS) tagging (Plank and Agic, 2018). We build on recent work by Tyers et al (2018) who show that in the absence of annotated training data for the target language, a lexicalized treebank can be created by translating a target language corpus into a number of related source languages and parsing the translations using models trained on the source language treebanks.1 These annotations are projected to the target language using separate word alignments for each source language, combined into a single graph for each sentence and decoded (Sagae and Lavie, 2006), resulting in a treebank for the target language, Faroese in the case of Tyers et al.’s and our experiments. Tyers et al (2018) describe a method for creating synthetic treebanks for Faroese based on previous work which uses machine translation and word alignments to transfer trees from source language(s) to the target language. It is shown that, for Faroese, a combination of the four source languages (multi-source projection) is superior to individual language projection

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cross-lingual Parsing with Polyglot Training and Multi-treebank Learning: A Faroese Case Study

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 31	License type: cc-by

Similar Papers

A Survey of the Model Transfer Approaches to Cross-Lingual Dependency Parsing
Ayan Das ... Sudeshna Sarkar
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19
Ayan Das, et. al.Ayan Das ... Sudeshna Sarkar
01 Jun 2020
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 19

Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets
Željko Agić ... Sara Moze
-
Željko Agić, et. al.Željko Agić ... Sara Moze
01 Jan 2014
01 Jan 2014

Cross-lingual dependency parsing for a language with a unique script
He Zhou ... Sandra Kübler
Natural Language Processing | VOL. -
He Zhou, et. al.He Zhou ... Sandra Kübler
09 Sep 2024
Natural Language Processing | VOL. -

Cross-lingual transfer learning during supervised training in low resource scenarios
Amit Das ... Mark Hasegawa-Johnson
-
Amit Das, et. al.Amit Das ... Mark Hasegawa-Johnson
06 Sep 2015
06 Sep 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-lingual Parsing with Polyglot Training and Multi-treebank Learning: A Faroese Case Study

Abstract

Highlights

Summary

Talk to us

Similar Papers