Abstract

Reduplication is a productive morphological process widely used in a substantial number of languages in the world. Reduplication is a well-studied phenomenon, and several typological works have provided evidence for different types of reduplication in most of the languages around the world. Addressing reduplication plays a vital role in the efficiency of POS tagger, sentiment analysis, as well as other NLP tasks. However, it is an understudied area in computational linguistics, especially in low-resource languages like Assamese. This article first describes different types of reduplication and their shapes that occur in Assamese. Second, an exhaustive set of reduplication formation rules is compiled that is incorporated to build a system to identify reduplication in Assamese text. The results of the experiments performed on three different domain datasets showed that the rule-based system can identify reduplicated expressions with an average precision, recall, and F1 scores of 94.19%, 98.07%, and 96.07%, respectively. Third, it is shown that the Assamese reduplication processes can be captured through a two-way finite-state transducer (2-way FST). Finally, two broad categories of reduplicative processes along with their corresponding 2-way FST model are presented.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.