Morphological disambiguation from stemming data

Antoine Nzeyimana

doi:10.18653/v1/2020.coling-main.409

Abstract

Morphological analysis and disambiguation is an important task and a crucial preprocessing step in natural language processing of morphologically rich languages. Kinyarwanda, a morphologically rich language, currently lacks tools for automated morphological analysis. While linguistically curated finite state tools can be easily developed for morphological analysis, the morphological richness of the language allows many ambiguous analyses to be produced, requiring effective disambiguation. In this paper, we propose learning to morphologically disambiguate Kinyarwanda verbal forms from a new stemming dataset collected through crowd-sourcing. Using feature engineering and a feed-forward neural network based classifier, we achieve about 89% non-contextualized disambiguation accuracy. Our experiments reveal that inflectional properties of stems and morpheme association rules are the most discriminative features for disambiguation.

Highlights

Morphological analysis and disambiguation plays a critical role in most natural language processing (NLP) tasks
When inflections are generated by piecing together multiple morphemes, a large and sparse vocabulary is produced, requiring tools to unpack the individual morphemes for downstream NLP tasks such information extraction and machine translation
Research on NLP for low resource languages lags behind recent advancements made for NLP on high resource languages

Summary

Introduction

Morphological analysis and disambiguation plays a critical role in most natural language processing (NLP) tasks. While several morphologically rich languages such as Turkish, Arabic and Modern Hebrew already have mature tools for morphological segmentation (Coltekin, 2010) (Co ̈ltekin, 2014) (Itai and Segal, 2003) (Habash and Rambow, 2006), Kinyarwanda still lacks appropriate tools for the task. A key limitation in the effort is the need to have high quality datasets manually annotated by language experts. We leverage an easy to collect stemming dataset and transform it into a resource for morphological disambiguation. Collecting stemming data is much faster and less prone to errors than full morphological segmentations which require subtle linguistic knowledge

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Morphological disambiguation from stemming data

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 1	License type: cc-by

Similar Papers

Morphological Disambiguation for Turkish
Dilek Zeynep Hakkani-Tür ... Deniz Yuret
-
Dilek Zeynep Hakkani-Tür, et. al.Dilek Zeynep Hakkani-Tür ... Deniz Yuret
01 Jan 2018
01 Jan 2018

Morphological Disambiguation and Text Normalization for Southern Quechua Varieties
Annette Rios Gonzales ... Richard Alexander Castro Mamani
-
Annette Rios Gonzales, et. al.Annette Rios Gonzales ... Richard Alexander Castro Mamani
01 Jan 2014
01 Jan 2014

Transmorph: a transformer based morphological disambiguator for Turkish
Hi̇lal Özer ... Emi̇n Erkan Korkmaz
Turkish Journal of Electrical Engineering and Computer Sciences | VOL. 30
Hi̇lal Özer, et. al.Hi̇lal Özer ... Emi̇n Erkan Korkmaz
01 Jul 2022
Turkish Journal of Electrical Engineering and Computer Sciences | VOL. 30

Tagging and morphological disambiguation of Turkish text
Kemal Oflazer ... Ìlker Kuruöz
-
Kemal Oflazer, et. al.Kemal Oflazer ... Ìlker Kuruöz
01 Jan 1993
01 Jan 1993

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Morphological disambiguation from stemming data

Abstract

Highlights

Summary

Talk to us

Similar Papers