Abstract

Twitter is one of the most popular social network sites on the Internet to share opinions and knowledge extensively. Many advertisers use these Tweets to collect some features and attributes of Tweeters to target specific groups of highly engaged people. Gender detection is a sub-field of sentiment analysis for extracting and predicting the gender of a Tweet author. In this paper, we aim to investigate the gender of Tweet authors using different classification mining techniques on Arabic language, such as Naive Bayes (NB), Support vector machine (SVM), Naive Bayes Multinomial (NBM), J48 decision tree, KNN. The results show that the NBM, SVM, and J48 classifiers can achieve accuracy above to 98%, by adding names of Tweet author as a feature. The results also show that the preprocessing approach has negative effect on the accuracy of gender detection. In nutshell, this study shows that the ability of using machine learning classifiers in detecting the gender of Arabic Tweet author.

Highlights

  • Nowadays, the existence of many social websites such as Twitter, Facebook, Myspace and blogs that make the internet a large repository of different type of data

  • This research aims to test the ability of many machine learning classifiers, such as J48, KNN, Naïve Bayes, Naïve Bayes Multinomial (NBM) and Support vector machine (SVM) in detecting the gender of Arabic Tweet’s writers

  • The results show a negative effect of preprocessing on the accuracy of all classifiers

Read more

Summary

Introduction

The existence of many social websites such as Twitter, Facebook, Myspace and blogs that make the internet a large repository of different type of data These media allow different type of users from different cultures and languages to communicate and share their opinions, and experience with others. These opinions represent many kind of information (political, sport, technology, etc.) that come from different sources. Sentiment analysis or opinion mining is a field aims to extract or predict the polarity of people opinions in specific areas. This is considered as a challenging task for sentiment analysis

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call