War of Words: Harnessing the Potential of Large Language Models and Retrieval Augmented Generation to Classify, Counter and Diffuse Hate Speech

Rohan Leekha Rohan Leekha,Olga Simek Olga Simek,Charlie Dagli Charlie Dagli

doi:10.32473/flairs.37.1.135484

Rohan Leekha Rohan Leekha, Olga Simek Olga Simek + Show 1 more

https://doi.org/10.32473/flairs.37.1.135484

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This paper explores the emergence of divergent narratives in the wake of the Russian-Ukraine war, which began on February 24, 2022, and the innovative application of AI language models, specifically RetrievalAugmented Generation (RAG) and instruction-based large language models (LLMs), in countering hateful speech on social media. We design a pipeline to automatically discover and then respond to hateful content trending on social media platforms. Monitoring via traditional topic/narrative modeling often focuses on lowlevel content, which is difficult to interpret. In addition, workflows for prioritization and response generation are often highly manual. We utilize several large language models (LLMs) throughout our pipeline to detect and summarize topics, to determine whether tweets contain hate speech and to generate counter narratives. We test our approach on Ukraine Bio-Lab Tweet Corpus of 500k Tweets and evaluate the counter-narrative generation performance across several dimensions: relevance, grammaticality, factuality, and diversity. Our approach outperforms existing state of the art algorithms for hate speech detection and promising counter-narrative generation performance scores across our metrics reflect effectiveness of our pipeline in addressing hateful social media posts

Full Text