Abstract
The proposed research investigates a novel approach of character-level Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (Bi-LSTM) for part-of-speech (POS) tagging in the Assamese language. The proposed work contributes to Natural Language Processing (NLP) by exploring these models’ ability to assign grammatical labels (POS tags) to individual words within Assamese sentences. The corpus encompasses 60,000 Assamese words and utilizes the LDCIL Assamese tagset. For training and testing, the corpus is divided into an 80:20 ratio where 80% of the corpus is used for training the models, and the rest of 20% is used for evaluation. The character-level LSTM model achieves an accuracy of 92.80%, while the character-level Bi-LSTM model surpasses it by achieving an accuracy of 93.36%. The performance of the proposed research outperforms the existing research works in the Assamese language. The results of this work broaden the understanding of POS tagging in Assamese, offering valuable findings that could be applied to other languages with similar characteristics
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have