Classification of intent in moderating online discussions: An empirical evaluation

Christoph Gehweiler,Oleg Lobachev

doi:10.1016/j.dajour.2024.100418

Abstract

This paper investigates the use of large language models (LLMs) for moderating online discussions, with a focus on identifying user intent in various types of content. It centers on natural language processing (NLP) techniques to detect toxic language, derailment in discussions, and problematic comments. The study creates prototypes of such tools, utilizing LLMs. Then we evaluate our tools using datasets in both English and German, as the effectiveness across different languages may vary. This research explores content classification through methods such as sentiment analysis, keyword extraction and topic modeling, employing non-binary labeling for a deeper analysis of online interactions. The paper also discusses the limitations of current LLMs, including the challenge of false positives due to limited training data. It concludes with ideas towards improving model fine-tuning to better address specific platform needs and linguistic variations. This work contributes to understanding how AI can support decisions in moderating online spaces and fostering healthier digital communication environments.

Full Text