Abstract

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.

Highlights

  • State-of-the-art approaches to inducing part-ofspeech (POS) taggers and dependency parsers only scale to a small fraction of the world’s ∼6,900 languages

  • The major bottleneck is the lack of manually annotated resources for the vast majority of these languages, including languages spoken by millions, such as Marathi (73m), Hausa (50m), and Kurdish (30m)

  • The first one does not entirely adhere to Universal Dependencies (UD), but we provide a POS tagset mapping and a few modifications and include it as a test language to deepen the robustness assessment for our approach across language families

Read more

Summary

Introduction

State-of-the-art approaches to inducing part-ofspeech (POS) taggers and dependency parsers only scale to a small fraction of the world’s ∼6,900 languages. Cross-lingual transfer learning—or cross-lingual learning—refers to work on using annotated resources in other (source) languages to induce models for such low-resource (target) languages. Most work in cross-lingual learning, makes assumptions about the availability of linguistic resources that do not hold for the majority of low-resource languages. The best cross-lingual dependency parsing results reported to date were presented by Rasooli and Collins (2015). They use the intersection of languages covered in the Google dependency treebanks project and those contained in the Europarl corpus. They only consider closely related Indo-European languages for which high-quality tokenization can be obtained with simple heuristics

Objectives
Methods
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call