Sentence-Level Sarcasm Detection in English and Filipino Tweets

Mary Jane C Samonte,Caroline B Soriano,Carl Justine T Dollete,Maristela Louise C Flores,Paolo Mikkael M Capanas

doi:10.1145/3288155.3288172

Abstract

Sarcasm is special form of sentiment which defines as a nuanced form of language in which individuals say the opposite of what is implied. In this study, the researchers collected 6,000 Tagalog tweets and 6,000 English tweets from the microblogging site, Twitter, and annotated it by linguistic experts. Tweets were classified using Naive Bayesian, Support Vector Machine and Maximum Entropy supervised machine learning algorithms to develop model. The study focused on sentence-level sarcasm detection that includes lexical, pragmatic, hyperbole, quotations and punctuation marks as the features extracted in the tweets. The tweets collected were about government, politics, weather, social media, and public transport that are within 2 to 5 years from the course of this research. English and Filipino tweets classification show significant on results of the experiments done. Each algorithm demonstrates its own distinct weakness and strength in both English and Filipino datasets. It is concluded that expert annotation and balanced dataset plays an important role in the results of sarcasm detection.

Full Text