Clustering analysis of online discussion participants

Peter Krammer,Marcel Kvassay,Ján Mojžiš,Ivana Budinská,Ladislav Hluchý,Marek Jurkovič

doi:10.1016/j.procs.2018.07.161

Abstract

In this paper we perform density-based clustering of discussion participants from online editions of two major Slovak national newspapers, Sme.sk and Cas.sk. We use language-independent statistical attributes characterising their communication patterns and the content of their posts. In each newspaper, we separately analyse two categories of news (domestic and international). A large majority of participants in each dataset was found to belong to one stable and dominant cluster present in all our datasets. We interpret it as comprising the “standard” or “average” discussion participants. The remaining participants could be viewed as various kinds of “anomalies” or “departures from normal” (not necessarily negative) and were assigned to several minor clusters. The shape and position of some minor clusters generalized well across the datasets. Overall, we found significant structural similarities between the four datasets in terms of histograms of attributes, the existence of one stable and dominant cluster, and the similar shape and location of several minor clusters. This is a significant result given that the four datasets were largely independent and the two newspapers adopted radically different policies for dealing with karma and foul language. The proposed approach therefore looks very promising as a means of identifying anomalous behaviour on diverse online discussion platforms.

Full Text