Hierarchical self attention based sequential labelling model for Bhojpuri, Maithili and Magahi languages

Rajesh Kumar Mundotiya,Swasti Mishra,Anil Kumar Singh

doi:10.1016/j.jksuci.2021.09.022

Abstract

Sequential labelling plays a vital role in solving numerous Natural Language Processing (NLP) applications such as Machine Translation and Information Extraction etc. One of these is Part-of-Speech (POS) tagging, which assigns a sequence of grammatical categories to the given sentence, and Chunking which groups them into ‘chunks’ or what can be called minimal phrases. Bhojpuri, Maithili and Magahi are low resource languages and widely spoken in central north-eastern India, belonging to the Indo-Aryan language family. The creation of an annotated corpus for POS tagging and Chunking, and then building an initial automatic tool for these problems is the first attempt towards building language technology tools for these languages. The annotated corpus used to develop POS Taggers and Chunkers, based on various machine learning algorithms (TnT, CRF, MEMM and Structured SVM) and state-of-the-art LSTM-CNN-CRF model, and then these compared with the obtained results on two new proposed deep learning-based models, Self-Attention Hierarchical Bi-LSTM CRF (SAHBiLC) and a fine-tuned version of it, Fine-SAHBiLC. The SAHBiLC and Fine-SAHBiLC models outperform on Bhojpuri (Accuracy for POS and Chunking is 0.86% and 0.94%, respectively) and Maithili (Accuracy for POS and Chunking is 0.86% and 0.95%, respectively) and Magahi (Accuracy for POS is 0.86%).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of King Saud University - Computer and Information Sciences	Publication Date: Oct 7, 2021
Citations: 1	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Hierarchical self attention based sequential labelling model for Bhojpuri, Maithili and Magahi languages

Abstract

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences

Lead the way for us

Similar Papers

Part of Speech Tagging for Setswana African Language
M.A Dibitso ... P A Owolawi
-
M.A Dibitso, et. al.M.A Dibitso ... P A Owolawi
01 Nov 2019
01 Nov 2019

Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
Tusarkanta Dalai ... Tapas Kumar Mishra
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23
Tusarkanta Dalai, et. al.Tusarkanta Dalai ... Tapas Kumar Mishra
08 Feb 2024
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 23

Improving the Performance of Vietnamese–Korean Neural Machine Translation with Contextual Embedding
Van-Hai Vu ... Cheol-Young Ock
Applied Sciences | VOL. 11
Van-Hai Vu, et. al.Van-Hai Vu ... Cheol-Young Ock
23 Nov 2021
Applied Sciences | VOL. 11

Efficient algorithms for linear summed error structural SVMs
P Balamurugan ... T Ravindra Babu
-
P Balamurugan, et. al.P Balamurugan ... T Ravindra Babu
01 Jun 2012
01 Jun 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hierarchical self attention based sequential labelling model for Bhojpuri, Maithili and Magahi languages

Abstract

Talk to us

Similar Papers

More From: Journal of King Saud University - Computer and Information Sciences