Canonical Segmentation and Syntactic Morpheme Tagging of Four Resource- scarce Nguni Languages

Martin Puttkammer,Jakobus S Du Toit

doi:10.55492/dhasa.v3i03.3818

Abstract

Morphological analysis involves investigating the syntactic class of a word but can also extend to the decomposition and syntactic analysis of its underlying morpheme composition. This is especially relevant to languages with an agglutinative writing system where multiple linguistic words are expressed as a single orthographic word. In this paper, we propose a memory-based approach to canonical segmentation using a windowing approach to recover the uncondensed morphemes that differ from the surface form of a word. Additionally, we propose treating the syntactic labelling of morphemes as a sequence labelling task, similar to part of speech tagging. This approach leverages the internal morpheme composition of a word as local context in much the same way that the surrounding sentence of word serves in the disambiguation of its part-of-speech. Both tasks are modelled separately but performed sequentially by cascading the decomposed morphemes of a word into the task of syntactic labelling. When evaluated on four resource-scarce, conjunctively written Nguni languages, the proposed approach achieves an overall accuracy ranging between 82% and 92% which outperforms previously developed rule-based analysers for the same languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Canonical Segmentation and Syntactic Morpheme Tagging of Four Resource- scarce Nguni Languages

Abstract

Talk to us

Similar Papers

More From: Journal of the Digital Humanities Association of Southern Africa (DHASA)

Lead the way for us

Journal: Journal of the Digital Humanities Association of Southern Africa (DHASA)	Publication Date: Jan 1, 2021
License type: cc-by-sa

Similar Papers

ASRNN: A recurrent neural network with an attention model for sequence labeling
Jerry Chun-Wei Lin ... Unil Yun
Knowledge-Based Systems | VOL. 212
Jerry Chun-Wei Lin, et. al.Jerry Chun-Wei Lin ... Unil Yun
06 Nov 2020
Knowledge-Based Systems | VOL. 212

A Self-Attention Based Joint Sequence Labeling Model
Eryong Wu ... Xiaoming Liu
-
Eryong Wu, et. al.Eryong Wu ... Xiaoming Liu
24 Jun 2022
24 Jun 2022

DAKE: Document-Level Attention for Keyphrase Extraction
Tokala Yaswanth Sri Sai Santosh ... Partha Pratim Das
-
Tokala Yaswanth Sri Sai Santosh, et. al.Tokala Yaswanth Sri Sai Santosh ... Partha Pratim Das
01 Jan 2020
01 Jan 2020

Text/Non-text Classification in Online Handwritten Documents with Recurrent Neural Networks
Truyen Van Phan ... Masaki Nakagawa
-
Truyen Van Phan, et. al.Truyen Van Phan ... Masaki Nakagawa
01 Sep 2014
01 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Canonical Segmentation and Syntactic Morpheme Tagging of Four Resource- scarce Nguni Languages

Abstract

Talk to us

Similar Papers

More From: Journal of the Digital Humanities Association of Southern Africa (DHASA)