Abstract
Social media and microblogging platforms generally contain elements of figurative and nonliteral language, including satire. The identification of figurative language is a fundamental task for sentiment analysis. It will not be possible to obtain sentiment analysis methods with high classification accuracy if elements of figurative language have not been properly identified. Satirical text is a kind of figurative language, in which irony and humor have been utilized to ridicule or criticize an event or entity. Satirical news is a pervasive issue on social media platforms, which can be deceptive and harmful. This paper presents an ensemble scheme for satirical news identification in Turkish news articles. In the presented scheme, linguistic and psychological feature sets have been utilized to extract the feature sets i.e. linguistic, psychological, personal, spoken categories, and punctuation . In the classification phase, accuracy rates of five supervised learning algorithms i.e. naive Bayes algorithm, logistic regression, support vector machines, random forest, and k-nearest neighbor algorithm with three widely utilized ensemble methods i.e. AdaBoost, bagging, and random subspace have been considered. Based on the results, we concluded that the random forest algorithm yielded the highest performance, with a classification accuracy of 96.92% for satire detection in Turkish. For deep learning-based architectures, we have achieved classification accuracy of 97.72% with the recurrent neural network architecture with attention mechanism.
Highlights
Recent advances in information technologies made it possible to reach immense quantities of unstructured online text documents
Regarding the predictive performance values acquired by distinct psychological and linguistic feature sets on satire identification with supervised learning algorithms presented in Table 5, classification accuracies and AUC values of random forest usually outperform the other classification algorithms
The experimental evaluation aims to analyze the predictive output of LIWC-based feature sets and their ensembles as feature sets for Turkish satire identification
Summary
Recent advances in information technologies made it possible to reach immense quantities of unstructured online text documents. The online content available on the Web consists of figurative and nonliteral linguistic elements, including metaphor, analogy, irony, and satire. Irony, sarcasm, or satire may be employed to express more complex issues. The identification of figurative language can be a challenging problem for computers, and for human beings. The identification of figurative language is a fundamental task for sentiment analysis. The classification accuracy of sentiment analysis schemes may be degraded if elements of figurative language have not been properly addressed [1, 2]
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have