Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN

Deepak Jain,Akshi Kumar,Geetanjali Garg

doi:10.1016/j.asoc.2020.106198

Abstract

Analyzing explicit and clear sentiment is challenging owing to the growing use of emblematic and multilingual language constructs. This research proposes sarcasm detection using deep learning in code-switch tweets, specifically the mash-up of English with Indian native language, Hindi. The proposed model is a hybrid of bidirectional long short-term memory with a softmax attention layer and convolution neural network for real-time sarcasm detection. To evaluate the performance of the proposed model, real-time mash-up tweets are extracted on the trending political (#government) and entertainment (#cricket, #bollywood) posts on Twitter. The randomly sampled dataset contains 3000 sarcastic and 3000 non-sarcastic bilingual Hinglish (Hindi + English) tweets. Feature engineering is done using pre-trained GloVe word embeddings to extract English semantic context vector, hand-crafted features using subjective lexicon Hindi-SentiWordNet to generate the SentiHindi feature vector and an auxiliary pragmatic feature vector depicting the count of pragmatic markers in tweet. Performance analysis is done to compare and validate the proposed softAttBiLSTM-feature-richCNN model. The model outperforms the baseline deep learning models with a superior classification accuracy of 92.71% and F-measure of 89.05%.

Full Text