Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning

Ayaz H Khan,Muhammad Zubair

doi:10.1007/s11042-020-09512-2

Abstract

Twitter is a social media platform which has been proven to be a great tool for insights of emotions about products, policies etc. through a 280-character message called tweet, containing direct and unfiltered emotions by a large amount of user population. Twitter has attracted the attention of many researchers owing to the fact that every tweet is by default, public in nature which is not the case with Facebook. This paper proposes a model for multi-lingual (English and Roman Urdu) classification of tweets over diversely ranged classes (non-hierarchical architecture). Previous work in tweet classification is narrowly focused either on single language or either on uniform set of classes at most (Positive, Extremely Positive, Negative and Extremely Negative). The proposed model is based on semi-supervised learning and proposed feature selection approach makes it less dependent and highly adaptive for grabbing trending terms. This makes it a strong contender of choice for streaming data. In the methodology, using Naive Bayes learning algorithm for each phase, obtained remarkable accuracy of up to 87.16% leading from both KNN and SVM models which are popular for NLP and Text classification domains.

Full Text