A Deep Learning Approach to Malayalam Parts of Speech Tagging

Anto P. Babu,M. K. Junaida

doi:10.1007/978-3-030-49500-8_21

Abstract

This paper presents a deep learning based approach to Malayalam Parts of Speech (POS) tagging. We applied two neural sequence labelling models long short-term memory (LSTM) and Convolution Neural Network (CNN). The proposed model is an end-to-end deep neural network and that benefits from both word and character level representations. We have studied the performance of a six different combinations of neural sequence labelling models on the ILCI Phase II Malayalam dataset and achieved accuracy up to 87.05% for POS tagging. The proposed Word LSTM model with character LSTM and Softmax gives little improvement than character LSTM and Conditional random Field (CRF) models. Also we demonstrated the effect of word and character embeddings together for Malayalam POS Tagging. The proposed approach can be extended to other languages as well as other sequence labelling tasks like Chunking and Named Entity Recognition, etc.

Full Text