Lexicon based sentiment analysis system for malayalam language

M P Ashna,Ancy K Sunny

doi:10.1109/iccmc.2017.8282571

Abstract

Sentiment Analysis is a natural language processing task that mines information from various text forms such as reviews, news, and blogs and classifies them by their polarity as positive, negative or neutral. Mining sentiments in Malayalam come with a lot of issues and challenges. As compared to English, Malayalam is a free order and morphologically rich language, which adds complexity while handling the user- generated content. Much of the research in Malayalam sentiment analysis has been done using different supervised learning techniques. Although the Supervised learning methods provide better accuracy compared to dictionary-based approach, supervised learning method cannot perform well without sufficient training examples. The accuracy of supervised learning method is directly related to the quality of training corpus created. In Dictionary based approach a sentiment lexicon is created from a pre-annotated seed list of words and its synonyms and antonyms obtained from WorldNet for the purpose of classifying the sentiment. Compared to supervised learning techniques dictionary based approach takes less processing time. But there is no sentiment lexicon readily available for the Malayalam language. So in order to perform sentiment analysis by using lexicon based approach, a sentiment lexicon should be created. In this work, a lexicon based document-level sentiment analysis system is proposed for Malayalam language. Dictionary based approach is used to develop the Malayalam sentiment lexicon. This is because, dictionary based method is typically more efficient than other approaches and include all the words. Besides, dictionary approach is not domain specific that means it is applicable to all domains. The proposed system gives an accuracy of 87.5% for sentence level classification and 90% for document-level classification.

Full Text