Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation

Sajeetha Thavareesan,Sinnathamby Mahesan

doi:10.1109/iciis47346.2019.9063341

Abstract

Sentiment Analysis (SA) is an application of Natural Language Processing (NLP) to extract the sentiments expressed in the text. In this paper, we experimented five approaches to perform SA, namely, Lexicon based approach, Supervised Machine learning based approach, Hybrid approach, K-means with Bag of Word (BoW) approach and K-modes with BoW approach. We have experimented these approaches using five corpora with different feature representation techniques to predict the best approach to perform SA in Tamil texts. In this research we used Basic features such as word count and punctuation count in addition to traditional features such as Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) included to check their influence in the prediction. We have compared these approaches, features and the corpora. From the evaluation the highest accuracy of 79% is obtained for UJ_Corpus_Opinions_Nouns corpus with fastText for supervised Machine learning based approach.

Full Text