Abstract

Neural dependency parsing has achieved remarkable performance for many domains and languages. The bottleneck of massive labelled data limits the effectiveness of these approaches for low resource languages. In this work, we focus on dependency parsing for morphological rich languages (MRLs) in a low-resource setting. Although morphological information is essential for the dependency parsing task, the morphological disambiguation and lack of powerful analyzers pose challenges to get this information for MRLs. To address these challenges, we propose simple auxiliary tasks for pretraining. We perform experiments on 10 MRLs in low-resource settings to measure the efficacy of our proposed pretraining method and observe an average absolute gain of 2 points (UAS) and 3.6 points (LAS).

Highlights

  • Dependency parsing has greatly benefited from neural network-based approaches

  • Input representation consists of FastText (Grave et al, 2018)4 embedding of 300-dimension and convolutional neural network (CNN) based 100-dimensional character embedding (Zhang et al, 2015)

  • We focused on dependency parsing for low-resource morphological rich languages (MRLs), where getting morphological information itself is a challenge

Read more

Summary

Introduction

Dependency parsing has greatly benefited from neural network-based approaches. While these approaches simplify the parsing architecture and eliminate the need for hand-crafted feature engineering (Chen and Manning, 2014; Dyer et al, 2015; Kiperwasser and Goldberg, 2016; Dozat and Manning, 2017; Kulmizev et al, 2019), their performance has been less exciting for several morphologically rich languages (MRLs) and low-resource languages (More et al, 2019; Seeker and Cetinoglu, 2015). Several approaches have been suggested for improving the parsing performance of low-resource languages This includes data augmentation strategies, cross-lingual transfer (Vania et al, 2019) and using unlabelled data with semi-supervised learning (Clark et al, 2018) and self-training (Rotman and Reichart, 2019). Incorporating morphological knowledge substantially improves the parsing performance for MRLs, including lowresource languages (Vania et al, 2018; Dehouck and Denis, 2018). This aligns well with the linguistic intuition of the role of morphological markers, especially that of case markers, in deciding the syntactic roles for the words involved (Wunderlich and Lakamper, 2001; Sigursson, 2003; Kittilaet al., 2011). We primarily focus on one such morphologicallyrich low-resource language, Sanskrit

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call