Sarcasm and Irony Detection in English Tweets

Jona Dimovska,Dejan Gjorgjevikj,Gjorgji Madjarov,Marina Angelovska

doi:10.1007/978-3-030-00825-3_11

Abstract

This paper describes an approach to sarcasm and irony detection in English tweets. Accurate sarcasm and irony detection in text is crucial for numerous NLP applications like sentiment analysis, opinion mining and text summarization. The detection of irony and sarcasm in microblogging posts can be even more challenging because of the restricted length of the message at hand, the informal language, emoticons and hash tags used. In our approach we combined a variety of standard lexical and syntactic features with specific features for capturing figurative content. All experiments were performed using supervised learning using different approaches for text preprocessing and feature extraction and four different classifiers. The corpus used was taken from SemEval2018 challenge containing a dataset with 3834 different tweets. The performance of the different approaches are reported and commented. The results have shown that the text preprocessing has very little impact on the results, while the word and sub-word frequencies are the most usable characteristics for determining irony in tweets. A separate experiment including a survey was also conducted in which human participants were challenged to label 20 given tweets from the dataset as ironic or not. The obtained results suggest that accurate irony detection in tweets can be a hard task even for humans.

Full Text