Abstract

Propaganda in the digital era is often associated with online news. In this study, we focused on the use of large language models and their detection of propaganda techniques in the electronic press to investigate whether it is a noteworthy replacement for human annotators. We prepared prompts for generative pre-trained transformer models to find spans in news articles where propaganda techniques appear and name them. Our study was divided into three experiments on different datasets—two based on an annotated SemEval2020 Task 11 corpora and one on an unannotated subset of the Polish Online News Corpus, which we claim to be an even bigger challenge as an example of an under-resourced language. Reproduction of the results of the first experiment resulted in a higher recall of 64.53% than the original run, and the highest precision of 81.82% was achieved for gpt-4-1106-preview CoT. None of our attempts outperformed the baseline F1 score. One of the attempts with gpt-4-0125-preview on original SemEval2020 Task 11 achieved an almost 20% F1 score, but it was below the baseline, which oscillated around 50%. Part of our work that was dedicated to Polish articles showed that gpt-4-0125-preview had a 74% accuracy in the binary detection of propaganda techniques and 69% in propaganda technique classification. The results for SemEval2020 show that the outputs of generative models tend to be unpredictable and are hardly reproducible for propaganda detection. For the time being, these are unreliable methods for this task, but we believe they can help to generate more training data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.