Abstract
Automated sentiment analysis of textual data is one of the central and most challenging tasks in political communication studies. However, the toolkits available are primarily for English texts and require contextual adaptation to produce valid results—especially concerning morphologically rich languages such as Hungarian. This study introduces (1) a new sentiment and emotion annotation framework that uses inductive approaches to identify emotions in the corpus and aggregate these emotions into positive, negative, and mixed sentiment categories, (2) a manually annotated sentiment data set with 5700 political news sentences, (3) a new Hungarian sentiment dictionary for political text analysis created via word embeddings, whose performance was compared with other available sentiment dictionaries. (4) Because of the limitations of sentiment analysis using dictionaries we have also applied various machine learning algorithms to analyze our dataset, (5) Last but not least to move towards state-of-the-art approaches, we have fine-tuned the Hungarian BERT-base model for sentiment analysis. Meanwhile, we have also tested how different pre-processing steps could affect the performance of machine-learning algorithms in the case of Hungarian texts.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have