Quantifying mood, content and dynamics of health forums

Ivan Rivera,James Curran,Jim Warren

doi:10.1145/2843043.2843379

Abstract

In this paper we examined content, mood and general dynamics of health forum discussions concerning vaccinations, genetically modified organisms (GMO) and a gluten-free diet and explored the ability to extract sentiment from social media. Using data from the social media website Reddit.com, we applied text mining techniques together with machine learning algorithms to derive insights. We used metadata from the source, text features, Latent Dirichlet Allocation (LDA) topic model outputs and manually annotated disposition labels that separate comments into affirmative or negative groups together with Gradient Boosted Models (GBM) to devise a set of disposition models inferring commentators' sentiment towards each topic and expand our understanding of relevant arguments. Manual annotation resulted in moderate interrater agreement of an average 0.48 Fleiss-Kappa. Despite that, the disposition models for each topic were able to achieve a balanced successful prediction rates of between 68% and 74% providing a considerably better than chance assessment of a commentator's disposition towards each topic. We observed changes in disposition over time and found areas of disagreement between the supporters and opponents of each topic. Despite the limitations associated with manual annotations, we obtained a wider view on the issues concerning the topics of interest than those offered by previous research.

Full Text