Abstract
This paper describes SiTAKA, our system that has been used in task 4A, English and Arabic languages, Sentiment Analysis in Twitter of SemEval2017. The system proposes the representation of tweets using a novel set of features, which include a bag of negated words and the information provided by some lexicons. The polarity of tweets is determined by a classifier based on a Support Vector Machine. Our system ranks 2nd among 8 systems in the Arabic language tweets and ranks 8th among 38 systems in the English-language tweets.
Highlights
Sentiment analysis in Twitter is the problem of identifying people’s opinions expressed in tweets
This paper proposes the representation of tweets using a novel set of features, which include the information provided by seven lexicons and a bag of negated words (BonW)
The evaluation metrics used by the task organizers were the macroaveraged recall (ρ), the F1 averaged across the positives and the negatives F 1P N and the accuracy (Acc) (Rosenthal et al, 2017)
Summary
Sentiment analysis in Twitter is the problem of identifying people’s opinions expressed in tweets. The success of the Machine Learning models is based on two main facts: a large amount of labeled data and the intelligent design of a set of features that can distinguish between the positive, negative and neutral samples. With this approach, most studies have focused on designing a set of efficient features to obtain a good classification performance (Feldman, 2013; Liu, 2012; Pang and Lee, 2008). This paper proposes the representation of tweets using a novel set of features, which include the information provided by seven lexicons and a bag of negated words (BonW). In the last section the conclusions as well as further work are presented
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have