Abstract

The social networks are one of the main sources of big data. Continuously, it produce huge volume of variety types of data at high velocity rates. This huge volume of data contains valuable information that requires efficient and scalable analysis techniques to be extracted. Hadoop/MapReduce is considered the most suitable framework for handling big data because of its scalability, reliability and simplicity. One of the basic applications to extract valuable information from data is the sentiment analysis. The sentiment analysis studies peoples' opinion by classifying their written text into positive or negative polarity. In this work, a sentiment analysis method for analyzing a Twitter data set is analyzed. The method uses the Naive Bayes algorithm for classifying the text into positive and negative polarity. Several linguistic and NLP preprocessing techniques were applied on the data set. The aim of these preprocessing techniques is to study their effects on the quality of big data classification. The applied preprocessing techniques have achieved an enhancement in the classification accuracy of the Naive Bayes algorithm. The experiments prove that the performance of the sentiment analysis is enhanced by 5% using NLP and linguistic processing, yielding an accuracy of 73 % on the used data set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call