Abstract

Sentiment analysis is the study of people's opinions, sentiments, attitudes and emotions, expressed in written language but this process is time consuming, inconsistent and costly in business context. Pre-processing the data will help to ease this difficulty. Pre-processing is the process of cleaning and preparing the text for its analysis using pre-processing techniques. The existing pre-processing techniques are Handling Expressive Lengthening, Emoticons Handling, HTML Tags Removal, Punctuations Handling, Slangs Handling, Stopwords Removal, Stemming and Lemmatization. In this paper, the effect of various pre-processing techniques and their combinations was analyzed on the dataset taken from Kaggle called Bag of Words Meets Bags of Popcorn. By taking every possible combination of pre-processing techniques, the aim was to find the one giving highest accuracy. Random Forest Classifier was used to predict sentiments as it is known to give good accuracy and the result was evaluated using 10 fold cross validation method. Accuracy increased from unprocessed data to pre-processed data. It was concluded that using pre-processing techniques gives a higher accuracy than the traditional approach i.e. no pre-processing.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call