Abstract

This contribution proposes a new model for sentiment analysis, which combines the convolutional neural network (CNN), C4.5 decision tree algorithm, and Fuzzy Rule-Based System (FRBS). Our suggested method consists of six parts. Firstly we have applied several pre-processing techniques. Secondly, we have used the fastText method for vectoring the analysed tweets. Thirdly, we have implemented the CNN for extracting and selecting the pertinent features from the tweets. Fourthly, we have fuzzified the CNN output using the Gaussian Fuzzification (GF) method for coping with vague data. Then we have applied fuzziness C4.5 for creating the fuzziness rules. Finally, we have used the General Fuzziness Reasoning (GFR) approach for classifying the new tweets. In summary, our method integrates the advantages of CNN and C4.5 techniques and overcomes the shortcomings of ambiguous data in the tweets using FRBS, which is consists of three-phase: fuzzification phase using GF, inference mechanism using fuzziness C4.5, and defuzzification phase using GFR. Also, to give our approach the ability to deal with the massive data, we have implemented it on the Hadoop framework of five computers. The experiential findings confirmed that our model operates excellently compared to other chosen models form the literature.

Highlights

  • IntroductionHumans communicate with each other. In humankind’s history, communication is deemed an essential tool to resolve problems and strengthen social commitment and social engagement

  • By nature, humans communicate with each other

  • This section elaborates on the proposed hybrid approach that consolidates the convolutional neural network, C4.5 decision tree algorithm, and rule-based fuzziness system and Hadoop platform for performing the English sentence level classification

Read more

Summary

Introduction

Humans communicate with each other. In humankind’s history, communication is deemed an essential tool to resolve problems and strengthen social commitment and social engagement. In the area of NLP, scientific researchers have carried out Twitter opinion mining by performing five tasks which are: data collection, data cleaning, data vectorization, feature extraction and data classification In data classification, they employed several algorithms picked out from various types of approaches, such as lexicon-based technique, rule-based fuzzy system, machine learning, hybrid strategies and deep learning. Deep learning approaches are coming to overcome the problems of machine learning and lexicon-based techniques by using extensive engineering features, that is to say, they extract automatically the relevant and complete features which is amelioration in accuracy This motivated us to carry out Twitter opinion mining for the English language by applying a hybrid classifier that incorporates CNN and C4.5 algorithms. 5 concludes with conclusions and several recommendations for future contributions

Literature reviews
Our proposed approach
Data acquisition
Data cleaning
Data vectorization
Features extraction and selection
Data fuzzification
Data classification
Data parallelization
Experiments and results
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call