Abstract
With the advances in information and communication technologies, an immense amount of information has been shared on social media and microblogging platforms. Much of the online content contains elements of figurative language, such as, irony, sarcasm and satire. The automatic identification of figurative language can be viewed as a challenging task in natural language processing, where linguistic entities, such as, metaphor, analogy, ambiguity, irony, sarcasm, satire, and so on, have been utilized to express more complex meanings. The predictive performance of sentiment classification schemes may degrade if figurative language within the text has not been properly addressed. Satirical text is a way of figurative communication, where ideas/opinions regarding a people, event or issue is expressed in a humorous way to criticize that entity. Satirical news can be deceptive and harmful. In this paper, we present a machine learning based approach to satire detection in Turkish news articles. In the presented scheme, we utilized three kinds of features to model lexical information, namely, unigrams, bigrams and tri-grams. In addition, term-frequency, term-presence and TF-IDF based schemes have been taken into consideration. In the classification phase, Naive Bayes, support vector machines, logistic regression and C4.5 algorithms have been examined.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have