Abstract

This paper investigates the use of large language models (LLMs) for moderating online discussions, with a focus on identifying user intent in various types of content. It centers on natural language processing (NLP) techniques to detect toxic language, derailment in discussions, and problematic comments. The study creates prototypes of such tools, utilizing LLMs. Then we evaluate our tools using datasets in both English and German, as the effectiveness across different languages may vary. This research explores content classification through methods such as sentiment analysis, keyword extraction and topic modeling, employing non-binary labeling for a deeper analysis of online interactions. The paper also discusses the limitations of current LLMs, including the challenge of false positives due to limited training data. It concludes with ideas towards improving model fine-tuning to better address specific platform needs and linguistic variations. This work contributes to understanding how AI can support decisions in moderating online spaces and fostering healthier digital communication environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call