Sentiment Analysis of Tweets in Malayalam Using Long Short-Term Memory Units and Convolutional Neural Nets

S. Sachin Kumar,M. Anand Kumar,K. P. Soman

doi:10.1007/978-3-319-71928-3_31

Abstract

Sentiment analysis in natural language processing (NLP) is an important task as the text content contains several opinions about events, product and movie reviews, trading, marketing etc. In the past decade, researchers have performed the sentiment analysis using hand-crafted features and machine learning methods such as support vector machines, naive Bayes, conditional random field, maximum entropy method etc. Sentiment analysis on social media text gained lot of popularity as it contain recommendations and suggestions. Compared to the high-resource languages such as English, Chinese, French etc., sentiment analysis task in low-resource language suffers due to (1) absence of annotated corpus, (2) tools to extract features. The present paper derives its motivation and addresses the sentiment analysis task for tweets in Malayalam, a low-resource language. Due to the absence of dataset, 12922 tweets in Malayalam language is collected and annotated to either of three sentiment categories namely positive, negative and neutral. Recently, deep learning methods like long short-term memory (LSTM) and convolution neural network (CNN) have gained popularity by showing promising results for the problems in speech and image processing, tasks in NLP via learning feature rich deep representation from the data automatically. The current paper is first in its attempt to perform sentiment analysis of tweets in Malayalam language using LSTM and CNN. The paper presents 10-fold cross-validation results.

Full Text