Abstract

Recently, media and business companies are utilizing social media to reach a large set of users to maximize the amount of gained profit. Actually, these companies are looking for the best ways to satisfy their user's requirements. It is very difficult to understand these requirements because of the large set of users on social media like Twitter. For this reason, the goal of our research project is to build a classifier that can detect Arabian trends among Gulf area Twitter users. The new built classifier can assist these companies to deliver the convenient products and media contents like photos and videos according to users' trends. By using our own designed Java-based tool, we have collected a significant dataset of tweets. Also, two experiments of tweet classification have been implemented to compare the effects of balanced and imbalanced training data and to measure the effect of data size on the accuracy of classifiers. In both experiments, Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Naïve Bayes algorithms are used as classifiers. The first experiment uses small, imbalanced data sets and four classes of data, which are Sport, Politics, Islam and Culture. The Light and Root Stemmers were used with each classifier. The best outcome achieved in our research project by utilizing a Naïve Bayes algorithm with the Light Stemmer technique. It achieved an accuracy reaching 76.27%. In the second experiment, we used a balanced large data set with the same classifiers. In addition, we have added one more class to the new data set which is Economics. The experimental results showed that the best accuracy (81.17%) is obtained by using SVM with the Light Stemmer method. The Light Stemmer achieved the best outcomes for all classifiers since almost all of the tweets were written in dialects.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call