Abstract

Religious texts are known to be with specific patterns of writing, and also involve specific vocabularies. These are also known to follow specific style of writing. Thereby these texts are enriched with typical semantic and syntactic characteristics, demanding special attention for Natural Language Processing (NLP) tasks. This research paper focuses on the application of Deep Learning (DL) techniques for Parts of Speech (PoS) tagging focusing on Assamese language religious texts. We have created a specialized dataset comprising approximately 11,000 sentences extracted from various sources including web crawling and filtering religious texts from existing corpora. The dataset was manually validated by linguists to ensure accuracy, errors, and corrections required. A performance matrix was constructed to analyze the performance of the initial tagging using a pre-existing DL-based model trained for Assamese Universal Parts of Speech (UPoS) tagger. Following this, we utilized a subset of the dataset for manual evaluation, and the validated dataset is then considered as a gold standard training dataset for training other DL models using GRU, RNN and Bidirectional LSTM (BiLSTM) architectures. Training accuracies were recorded and presented, demonstrating the effectiveness of the proposed approach. Accuracies, Precision, and Recall were recorded for all the three Models. F1 scores also have been calculated. Comparison of training and testing accuracies are depicted with performance graphs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call