Abstract

Due to the continuous and rapid growth of daily posted data on the social media sites in many different languages, the automated classification of this huge amount of data has become one of the most important tasks for handling, managing, and organizing this huge amount of textual data. There exist many examples of social media sites, but Twitter is considered to be one of the most popular and commonly used, as users are able to communicate with each other, share their opinions, and express their emotions (sentiments) in the form of convenient short blogs using less than 140 words. Accordingly, many companies and organizations may analyze these sentiments in order to evaluate the users’ thoughts, and determine their polarity from the content of the text. For this process, natural language processing techniques, statistics, or machine learning algorithms are being used to identify and extract the sentiment of the text. In practice, many data mining techniques and algorithms are being applied to observe patterns and correlation among that huge amount of data. This paper proposes an efficient approach in handling Tweets, in both Arabic and English languages, with different processing techniques applied. This approach is based on using the Vector Space Model (VSM) to represent text documents and Tweets, and the Term Frequency Inverse Document Frequency (TFIDF) in a term weighting process to generate the feature vector for classification process. The proposed approach has been evaluated using several experiments with different classifiers on five datasets: Decision trees, Naive-Bayes, kNN, Logistic Regression, Perceptron, and Multilayer Perceptron. The experimental results reveal the effectiveness of our proposed approach when comparing classification results with the published work in [1, 2, 3].

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.