HMDSAD: Hindi multi-domain sentiment aware dictionary

Vandana Jha,Sudhashri S Hebbar,Venugopal K R Venugopal K R,P Deepa Shenoy,Savitha R Savitha R

doi:10.1109/coconet.2015.7411193

Abstract

Sentiment Analysis is a fast growing sub area of Natural Language Processing which extracts user's opinion and classify it according to its polarity into positive, negative or neutral classes. This task of classification is required for many purposes like opinion mining, opinion summarization, contextual advertising and market analysis but it is domain dependent. The words used to convey sentiments in one domain is different from the words used to express sentiments in other domain and it is a costly task to annotate the corpora in every possible domain of interest before training the classifier for the classification. We are making an attempt to solve this problem by creating a sentiment aware dictionary using multiple domain data. The source domain data is labeled into positive and negative classes at the document level and the target domain data is unlabeled. The dictionary is created using both source and target domain data. The words used to express positive or negative sentiments in labeled data has relatedness weights assigned to it which signifies its co-occurrence frequency with the words expressing the similar sentiments in target domain. This work is carried out in Hindi, the official language of India. The web pages in Hindi language is booming very quickly after the introduction of UTF-8 encoding style. The dictionary can be used to classify the unlabeled data in the target domain by training a classifier.

Full Text