Reduplication in Assamese: Identification and Modeling

Dhrubajyoti Pathak,Priyankoo Sarmah,Sukumar Nandi

doi:10.1145/3510419

Abstract

Reduplication is a productive morphological process widely used in a substantial number of languages in the world. Reduplication is a well-studied phenomenon, and several typological works have provided evidence for different types of reduplication in most of the languages around the world. Addressing reduplication plays a vital role in the efficiency of POS tagger, sentiment analysis, as well as other NLP tasks. However, it is an understudied area in computational linguistics, especially in low-resource languages like Assamese. This article first describes different types of reduplication and their shapes that occur in Assamese. Second, an exhaustive set of reduplication formation rules is compiled that is incorporated to build a system to identify reduplication in Assamese text. The results of the experiments performed on three different domain datasets showed that the rule-based system can identify reduplicated expressions with an average precision, recall, and F1 scores of 94.19%, 98.07%, and 96.07%, respectively. Third, it is shown that the Assamese reduplication processes can be captured through a two-way finite-state transducer (2-way FST). Finally, two broad categories of reduplicative processes along with their corresponding 2-way FST model are presented.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reduplication in Assamese: Identification and Modeling

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: May 17, 2022
Citations: 6

Similar Papers

A deep learning approach for gastroscopic manifestation recognition based on Kyoto Gastritis Score.
Ao Liu ... Jing Zhang
Annals of medicine | VOL. 56
Ao Liu, et. al.Ao Liu ... Jing Zhang
05 Nov 2024
Annals of medicine | VOL. 56

A Comparison of Lexicon-based and Transformer-based Sentiment Analysis on Code-mixed of Low-Resource Languages
Cuk Tho ... Widodo Budiharto
-
Cuk Tho, et. al.Cuk Tho ... Widodo Budiharto
28 Oct 2021
28 Oct 2021

AI-Driven localization of all impacted teeth and prediction of winter angulation for third molars on panoramic radiographs: Clinical user interface design
Taha Zirek ... Melek Tassoker
Computers in Biology and Medicine | VOL. 178
Taha Zirek, et. al.Taha Zirek ... Melek Tassoker
18 Jun 2024
Computers in Biology and Medicine | VOL. 178

COMPARISON OF ANN METHOD AND LOGISTIC REGRESSION METHOD ON SINGLE NUCLEOTIDE POLYMORPHISM GENETIC DATA
Adi Setiawan ... Rachel Wulan Nirmalasari Wijaya
BAREKENG: Jurnal Ilmu Matematika dan Terapan | VOL. 17
Adi Setiawan, et. al.Adi Setiawan ... Rachel Wulan Nirmalasari Wijaya
16 Apr 2023
BAREKENG: Jurnal Ilmu Matematika dan Terapan | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reduplication in Assamese: Identification and Modeling

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing