Abstract

While the application of word embedding models to downstream Natural Language Processing (NLP) tasks has been shown to be successful, the benefits for low-resource languages is somewhat limited due to lack of adequate data for training the models. However, NLP research efforts for low-resource languages have focused on constantly seeking ways to harness pre-trained models to improve the performance of NLP systems built to process these languages without the need to re-invent the wheel. One such language is Welsh and therefore, in this paper, we present the results of our experiments on learning a simple multi-task neural network model for part-of-speech and semantic tagging for Welsh using a pre-trained embedding model from FastText. Our model’s performance was compared with those of the existing rule-based stand-alone taggers for part-of-speech and semantic taggers. Despite its simplicity and capacity to perform both tasks simultaneously, our tagger compared very well with the existing taggers.

Highlights

  • The Welsh language can be classified as low resourced in the context of natural language processing because the lack of the commonly used resources in language research such as large annotated corpora as well as the standard computational tools and techniques for processing these resources.There is still a long way to go for Welsh, but the situation is improving

  • Given the potential challenges with the existing approaches and considering the similarities between the tasks of part-of-speech (POS) and semantic (SEM) annotation, we propose to train a single neural network model that can jointly learn both of the tasks

  • The main contributions of this research includes: (1) The first application of multi-task learning to POS and semantic tagging for any language that we know of, (2) The ability to improve OOV coverage for the Welsh language using pre-trained embeddings for semantic category extension, (3) Public release of two sets of manually checked goldstandard corpora for POS and semantic tagging of Welsh, (4) Inter-annotator agreement scores for Welsh semantic tagging, (5) Public release of the first Welsh semantic tagger (CySemTagger) (6) The first demonstration of multi-task learning to improve Natural Language Processing (NLP) task accuracy for Welsh, and (7) A demonstration of the usefulness of multi-task learning in a mono-lingual setting for a low re

Read more

Summary

Introduction

The Welsh language can be classified as low resourced in the context of natural language processing because the lack of the commonly used resources in language research such as large annotated corpora as well as the standard computational tools and techniques for processing these resources.There is still a long way to go for Welsh, but the situation is improving. Given the potential challenges with the existing approaches and considering the similarities between the tasks of part-of-speech (POS) and semantic (SEM) annotation, we propose to train a single neural network model that can jointly learn both of the tasks. The main contributions of this research includes: (1) The first application of multi-task learning to POS and semantic tagging for any language that we know of, (2) The ability to improve OOV coverage for the Welsh language using pre-trained embeddings for semantic category extension, (3) Public release of two sets of manually checked goldstandard corpora for POS and semantic tagging of Welsh, (4) Inter-annotator agreement scores for Welsh semantic tagging, (5) Public release of the first Welsh semantic tagger (CySemTagger) (6) The first demonstration of multi-task learning to improve NLP task accuracy for Welsh, and (7) A demonstration of the usefulness of multi-task learning in a mono-lingual setting for a low re-. There is very little research that applies multi-task learning to link Word Sense Disambiguation (WSD) or semantic tagging with another task. Semantic tagging in multiple languages has been shown to greatly benefit from POS tagging in the NLP pipeline, since it can help to filter out inapplicable semantic fields from the set of possible candidates (Piao et al, 2015)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.