Depicting a Neural Model for Lemmatization and POS Tagging of Words from Palaeographic Stone Inscriptions

S Ezhilarasi,P.Uma Maheswari

doi:10.1109/iciccs51141.2021.9432315

Abstract

Lemmatization is essential before POS (Part-of-Speech) Tagging for analysis of morphology and the removal of inflections by returning the base of the word without the endings. POS is to indicate tagging the words into categories of grammatical terms in analysis of text and marking up linguistic words in a script. Considering the combinations and inflections in the words of Tamil language, there is difficulty in Lemmatization and POS Tagging classification and prediction of Tags of the words. As the automated tools are very rare for modern Tamil language there is a lack of such statistical methods and techniques for the Paleographic Tamil language such as the texts from inscriptions of stone where the words are combined, staked, overlapped and compounded without splitting up into morphemes or lemmas. The proposed work overcomes the complexity of splitting up and classifying ancient words. The proposed work is based on designing the Neural Model for POS Tag Classification and Prediction of Words from the Paleographic 11 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> century stone inscription script. Bi-LSTM model is implemented with the embedding layer of vectors of words for training the POS Tagging model and classifying the words into tags and prediction of Tags of words for any novel script given that involves syntactic tag assigning and predicting tag for concerning words efficiently. The proposed model provides 96.43% accuracy compared to the existing works in the stream.

Full Text