Abstract
Sarcasm identification in textual data is the most captivating area of research in the current research trends. It is a challenging task for humans as well as for the computer. In this paper, we have tried to identify sarcasm in the Hindi newspaper headlines of two of the most-read Hindi newspapers in India, namely Hindustan and Dainik Jagran. Initially, we collected 88,518 Hindi newspaper headlines and identified 1,945 headlines to be sarcastic, which we have considered for the present study. The headlines taken into consideration belong to the political domain and were published during some of the recent Legislative Assembly Elections of 2020, 2021 and 2022. Various machine learning and deep learning techniques have been used to develop the baseline models. It justifies the assumption that sarcastic text does not always bear a negative sentiment. It may bear a positive sentiment depending on the context. The present paper aims at the creation of a dataset consisting of 1,945 Hindi newspaper headlines, training and testing machine learning and deep learning models, namely Extra Trees Classifier, Random Forest Classifier, XGBClassifier, fasttext-stackedTCN and mBERT-stackedTCN for sarcasm identification on the dataset and comparing the results obtained by the models after the experiment. Out of all the choosen models, the Random Forest Classifier performs better with F 1 score of 92.11 before data augmentation and and 90.68 after data augmentation.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have