Many Languages, One Parser

Waleed Ammar,Chris Dyer,George Mulcaire,Noah A Smith,Miguel Ballesteros

doi:10.1162/tacl_a_00109

Many Languages, One Parser

Waleed Ammar, Chris Dyer + Show 3 more

Open Access

https://doi.org/10.1162/tacl_a_00109

Copy DOI

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2016
Citations: 268	License type: cc-by

Affiliation: Carnegie Mellon University, University of Washington, University of Barcelona

#Limited Annotations #Typological Similarities + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser’s performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.

Full Text