Cross-Lingual Syntactic Transfer with Limited Resources

Mohammad Sadegh Rasooli,Michael Collins

doi:10.1162/tacl_a_00061

Mohammad Sadegh Rasooli, Michael Collins

Open Access

https://doi.org/10.1162/tacl_a_00061

Copy DOI

Abstract

We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available. This method makes use of three steps: 1) a method for deriving cross-lingual word clusters, which can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source language treebanks; 3) a method for integrating these steps with the density-driven annotation projection method of Rasooli and Collins (2015). Experiments show improvements over the state-of-the-art in several languages used in previous work, in a setting where the only source of translation data is the Bible, a considerably smaller corpus than the Europarl corpus used in previous work. Results using the Europarl corpus as a source of translation data show additional improvements over the results of Rasooli and Collins (2015). We conclude with results on 38 datasets from the Universal Dependencies corpora.

Highlights

Creating manually-annotated syntactic treebanks is an expensive and time consuming task
We describe a method for transfer of lexical information from the target language into source language treebanks, using word-to-word translation dictionaries derived from parallel corpora
We describe an approach that gives significant improvements over the baseline. §3.1 describes a method for deriving cross-lingual clusters, allowing us to add cluster features φ(c)(x, y) to the model. §3.2 describes a method for adding lexical features φ(l)(x, y) to the model. §3.3 describes a method for integrating the approach with the density-driven approach of Rasooli and Collins (2015)

Summary

Introduction

Creating manually-annotated syntactic treebanks is an expensive and time consuming task. The Bible data contains a much smaller set of sentences (around 24,000) than other translation corpora, for example Europarl (Koehn, 2005), which has around 2 million sentences per language pair. This makes it a considerably more challenging corpus to work with. We achieve 80.9% average unlabeled attachment score (UAS) on these languages; in comparison the work of Zhang and Barzilay (2015), Guo et al (2016) and Ammar et al (2016b) have a UAS of 75.4%, 76.3% and 77.8%, respectively All of these previous works make use of the much larger Europarl (Koehn, 2005) corpus to derive lexical representations. Thirteen datasets (10 languages) have accuracies higher than 80.0%.1

The Parsing Model

Data Assumptions

A Baseline Approach

Translation Dictionaries

Our Approach

Learning Cross-Lingual Clusters

Treebank Lexicalization

Data and Tools

Results on the Google Treebank

Related Work

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2017
Citations: 74	License type: cc-by

R Discovery Prime

R Discovery Prime

Cross-Lingual Syntactic Transfer with Limited Resources

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Cross-Lingual Syntactic Transfer through Unsupervised Adaptation of Invertible Projections
Junxian He ... Zhisong Zhang
-
Junxian He, et. al.Junxian He ... Zhisong Zhang
01 Jan 2019
01 Jan 2019

Isomorphic Transfer of Syntactic Structures in Cross-Lingual NLP
Edoardo Maria Ponti ... Anna Korhonen
-
Edoardo Maria Ponti, et. al.Edoardo Maria Ponti ... Anna Korhonen
01 Jan 2018
01 Jan 2018

Syntax-augmented Multilingual BERT for Cross-lingual Transfer
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

CoNLL-X shared task on multilingual dependency parsing
Sabine Buchholz ... Erwin Marsi
-
Sabine Buchholz, et. al.Sabine Buchholz ... Erwin Marsi
01 Jan 2006
01 Jan 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-Lingual Syntactic Transfer with Limited Resources

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics