The growing number of digitally accessible text corpora and the accelerating development of NLP tools and methods (particularly the emergence of powerful large-scale language models) have allowed their widespread use in various classification tasks, including the vast field of sentiment analysis. However, these models must often be fine-tuned to perform this task efficiently. Therefore, we aimed to create a transformer-based fine-tuned model for the emotion and sentiment analysis of Hungarian political texts. The training data for the model were the manually annotated parliamentary speech texts from 2014 to 2018, which have the advantage of being rich in various emotions. The compiled corpus can be freely used for research purposes. In our work, we describe in detail the process of fine-tuning the Hungarian BERT model for sentiment and emotion classification, the performance achieved, and the typical classification errors, mainly due to a lack of recognition of pragmatic and other language use features by the fine-tuned models.
Read full abstract