Hadoop framework for efficient sentiment classification using trees

K. Sridharan,S. Daniel Madan Raja,G. Komarasamy

doi:10.1049/iet-net.2019.0208

Abstract

Due to the increase in the speed of generation of data, the authors are forced to handle a massive volume of data with the help of conventional machine learning algorithms. Big data is an enormous volume of data which is beyond the capacity of the traditional database software tool to collect, store, manage, and process within a stipulated time limit. Sentiment analysis is analysing the data by classifying the text on the basis of strength and polarity of opinion (positive/negative) words that define the text. While handling big data, Hadoop provides a platform for users to develop their own sentiment analysis with the help of a lexicon dictionary or available application programming interface (API) or external programs. The aim of classifying data is to analyse extensive data and develop an appropriate description or model for every organised class with the feature present in the data. In this work, the feature extraction based on term frequency-inverse document frequency is utilised and the Hadoop framework in attaining a useful classification with the help of random forest techniques.

Full Text