Abstract

This paper describes SiTAKA, our system that has been used in task 4A, English and Arabic languages, Sentiment Analysis in Twitter of SemEval2017. The system proposes the representation of tweets using a novel set of features, which include a bag of negated words and the information provided by some lexicons. The polarity of tweets is determined by a classifier based on a Support Vector Machine. Our system ranks 2nd among 8 systems in the Arabic language tweets and ranks 8th among 38 systems in the English-language tweets.

Highlights

  • Sentiment analysis in Twitter is the problem of identifying people’s opinions expressed in tweets

  • This paper proposes the representation of tweets using a novel set of features, which include the information provided by seven lexicons and a bag of negated words (BonW)

  • The evaluation metrics used by the task organizers were the macroaveraged recall (ρ), the F1 averaged across the positives and the negatives F 1P N and the accuracy (Acc) (Rosenthal et al, 2017)

Read more

Summary

Introduction

Sentiment analysis in Twitter is the problem of identifying people’s opinions expressed in tweets. The success of the Machine Learning models is based on two main facts: a large amount of labeled data and the intelligent design of a set of features that can distinguish between the positive, negative and neutral samples. With this approach, most studies have focused on designing a set of efficient features to obtain a good classification performance (Feldman, 2013; Liu, 2012; Pang and Lee, 2008). This paper proposes the representation of tweets using a novel set of features, which include the information provided by seven lexicons and a bag of negated words (BonW). In the last section the conclusions as well as further work are presented

Resources
En-SiTAKA Lexicons
Ar-SiTAKA Lexicons
Embeddings
Preprocessing and Normalization
Features Extraction
Syntactic Features
Lexicon Features
Cluster Features
Embedding Features
Classifier
Results
Conclusion
INGEOTEC
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call