Abstract
Automatic sarcasm detection in textual data is a crucial task in sentiment analysis. This problem is complex because sarcastic comments usually carry the opposite meaning and are context-driven. The issue of sarcasm detection in comments written in Perso-Arabic-scripted Urdu text is even more challenging due to limited online linguistic resources. In this research, we proposed Tanz-Indicator, a lexicon-based framework to detect sarcasm in the user comments posted in Perso-Arabic Urdu language. We use a lexicon of over 3000 sarcastic tweets and 100 sarcastic features for experimentation. We also train two machine learning models with the same data to compare the performance of the lexicon-based model and machine learning-based model. The results show that the lexicon-based model correctly identified 48.5% sarcastic and 23.5% nonsarcastic tweets with the recall of 69.6% and 87.9% precision. The recall rate of Naïve Bayes and SVM-based machine learning models was 20.1% and 24.4%, respectively, with an overall accuracy of 65.2% and 60.1%, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.