Abstract

Traditional approaches to Sentiment Analysis (SA) rely on large annotated data sets or wide-coverage sentiment lexica, and as such often perform poorly on under-resourced languages. This paper presents empirical evidence of an efficient SA approach using freely available machine translation (MT) systems to translate Arabic tweets to English, which we then label for sentiment using a state-of-theart English SA system. We show that this approach significantly outperforms a number of standard approaches on a gold-standard heldout data set, and performs equally well compared to more cost-intense methods with 76% accuracy. This confirms MT-based SA as a cheap and effective alternative to building a fully fledged SA system when dealing with under-resourced languages.

Highlights

  • Over the past decade, there has been a growing interest in collecting, processing and analysing usergenerated text from social media using Sentiment Analysis (SA)

  • Arabic SA faces a number of challenges: first, Arabic used in social media is usually a mixture of Modern Standard Arabic (MSA) and one or more of its dialects (DAs)

  • The fully-supervised machine learning (ML) baseline uses a freely available corpus of gold-standard annotated Arabic tweets (Refaee and Rieser, 2014c) to train a classifier using word n-grams and Support Vector Machines (SVM)

Read more

Summary

Introduction

There has been a growing interest in collecting, processing and analysing usergenerated text from social media using Sentiment Analysis (SA). Standard toolkits for Natural Language Processing (NLP) mainly cover the former and perform poorly on the latter 1. These tools are vital for the performance of machine learning (ML) approaches to Arabic SA: traditionally, ML approaches use a “bag of words” (BOW) model For morphologically rich languages, such as Arabic, a mixture of stemmed tokens and morphological features have shown to outperform BOW approaches (Abdul-Mageed et al, 2011; Mourad and Darwish, 2013), accounting for the fact that Arabic contains a very large number of inflected words.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call