Abstract

Language is a fundamental medium for human communication, encompassing spoken and written forms, each governed by grammatical rules. Sindhi, one of the oldest languages, is characterized by its rich morphology and grammatical structure. Part-of-speech (POS) tagging, a crucial process in natural language processing, involves assigning grammatical tags to words. This research presents a novel approach to POS tagging for Sindhi text using deep learning techniques. We developed a POS tagger employing Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models, with LSTM demonstrating superior effectiveness. This study represents the first application of these deep learning methods for POS tagging in Sindhi. Utilizing fastText, we trained 79,959 Sindhi word vectors, derived from a corpus compiled from diverse sources including Sindhi books, stories, and poetry. The corpus comprises 1,459 sentences and 10,584 unique words, split into 80% for training and 20% for validation. Our results indicate that the LSTM model achieved an accuracy of 85.80%, outperforming the GRU model, which achieved 80.77%, by a margin of 5%. This work's novelty lies in the application of deep learning techniques to enhance POS tagging accuracy in the Sindhi language corpus.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.