Abstract
Part-of-Speech (POS) tagging is an important task in Natural Language Processing and numerous taggers have been developed for POS tagging in several languages. In Sanskrit also, one of the oldest languages in the world, many POS taggers were developed. However, less attention was given to the machine learning based POS tagging. In this paper, various deep learning algorithms are used for implementing a POS tagger for Sanskrit. This problem is framed as a sequence labeling problem at the character level. Therefore, a word to be POS tagged is considered as a sequence of characters and the sequential relationship among the characters in a word is captured with the deep learning algorithms such as Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) networks, Gate Recurrent Unit (GRU) and their bidirectional versions. The character level formulation of the problem reduces the memory requirement compared to the word level implementations and also increases the accuracy of labeling. The performance of the labeling task was analyzed with the different combinations of hyper-parameters. We obtained the accuracy score of 97.86% with Bidirectional GRU. The character level implementations of both uni and bidirectional forms of RNN, LSTM and GRU outperformed all world level implementations in terms of accuracy, number of trainable parameters and the storage requirement.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.