In the digital era, the importance of extracting the hidden sentiments from user reviews plays a prominent role, to increase the profitability of an organization. The interest among, the research community in Sentiment Analysis (SA) has grown exponentially. But there are enormous challenges still being faced in the field of SA namely: Identification of sarcasm/Irony/Conditional/Modifier statements present in the review, Identification of Aspects and sentiment word as a pair (Data Transformation), Rating the recognized Aspects towards predicting the overall aggregated sentiment, Analyzing and designing issues towards implementing the parallel Aspect Level sentiment. In the present research work, We have addressed each of this challenges using a serial hybridization model, Where, the output of each step, is input to the following stage. First, towards identification sarcasm. In which, the dictionary is updated with the set of sentiment words by manually crafted rules. Next, to mitigate the discovery of sentiment and aspect word pair. In which, Latent Dirichlet Allocation (LDA), Gibbs sampling techniques are used. Next, to present the result of sentiment analysis as the overall rating of data considered, Latent Aspect Rating Regression (LARR) model is proposed (Data Presentation). Finally, addressed the designing issues (deciding numbers of mappers and reducers needed) towards implementing the parallel Aspect Level sentiment Analysis with the objective of improving the resource utilization in Big Data clusters. This work can help the researchers doing research in the field of speech recognition, development of recommended systems. The evaluation Metric used in estimating the performance of each step in our research are F-score, Rand Index, Classification accuracy and Root Mean Absolute Error (RMAE), Throughput. The findings of our research work help the customer to directly use the result obtained from the proposed model in the form of Aspect level rating.
Read full abstract