Opinion Mining on US Airline Twitter Data Using Machine Learning Techniques

Abdelrahman I Saad

doi:10.1109/icenco49778.2020.9357390

Abstract

The Airline sector is an important field nowadays in the market. In order to keep that sector alive and up to date we have to consider opinion mining. Text sentiment analysis is a Natural Language Processing (NLP) technique to analyze text. In this research, we will use opinion mining one of the text sentiment applications to investigate customer feedback about airline services. One of the largest opinion mining sources is Twitter which contains a huge number of tweets that needs to be processed and analyzed to make a decision and enhance a certain service. In this research, we proposed a machine learning model to categorize Twitter posts into positive, negative and neutral categories. We implemented our model on a dataset containing tweets of 6 different Airlines in the US. We started our model by preprocessing steps where we cleaned tweets and extracted features to represent them as a feature vector and finally, we built our Bag of Words (BoW) model. In the classification phase, we applied 6 machine learning techniques Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), XgBoost (XGB), Naïve Bayes (NB) and Decision Tree (DT) to classify tweets. Finally, in the validation phase, we split data into 70% training and 30% testing, for the purpose of testing and validating the data we used the K-Fold Cross-Validation technique. Finally, we calculated Accuracy, Precision, Recall and F1-score for each classifier. After comparing the results of each classifier, we found that SVM had the highest accuracy of 83.31 %.

Full Text